Machine Learning in Finance - EP. Chan

October 28, 2022 4-minute read

Ernest P. Chan on Financial Machine Learning Link to heading

It is quite hard to apply ML in Finance, in order to get some value.
Simple linear models, and strategies (simplicity is in linear, parsimonious models), such as pairs trading using OLS to compute hedge ratios, and use bollinger bands to detect reversion to mean. Since more and more people, get in the game, there is [[Strategy decay]], making the strategies become less and less profitable.
Reason why ML wasnt applied earlier in finance is the issue of: Over-fitting
In ML, the issue of over-fitting and low interpretability. Finance datasets do not have a lot of data points to support more complex models, with many parameters. We need strategies to prevent over-fitting in ML, such as how dropout works in deep learning.
In constructing models, we need models, which aren’t black boxes. This brings in the question of: Interpretable ML. Interpretable ML calls for Feature engineering and engineering. What are the important features/inputs which make the model behave so. Most ML models, are data intensive, needing many features.
ML should be used mainly for [[risk management]], and [[capital allocation]], but not for signal generation (from my experience). For signal generation, we mostly want to predict the direction and magnitude of prediction of the markets. The issue is that many quant traders, and individuals have the same prediction models, and this competition implies that if there is some decent accuracy in anyone’s classification models, that would be arbitraged away. Financial markets evolve. You need to have extra edge, and create new features, which majority of people/traders might not have.

Traditional quant strats vs ML strats Link to heading

In traditional quant strats, there are very few variables, and hence few model parameters, in ML, there could be numerous variables.
In traditional strats, you use models, to do modelling on prices, and variables related to price, such as technicals, and fundamentals, while in ML strats, you’ll use them to model mostly [[alternative data]]. ML succeeds in non-linear data modelling. In non-linearity, not all the features will be used.
INterpretability in traditional strats is a major feature, however, because it is easy to understand, it is easy to replicate., which leads to [[Strategy decay]]. But in ML models, since they are difficult to understand, and replicate, then decay isnt as real.
Majority ML models, give outputs probabilistically, which is helpful in capital allocation, e.g. a BUY with 89% probability, makes a trader allocate more capital.
In ML models, it is easier to do statistical inference, do feature importance, and do model validation using statistical and ML techniques.
Most traditional models are deterministic, while most ML models, are stochastic, and intrinsically random. Most models,require you to set seed for random reproducibility, however this is a good feature in building robust models, and test for accuracies, sensitivities, etc.

Process of strat building using ML Link to heading

Financial data science: Mostly takes 80% of time. Here you tackle numerous problems with financial data, such as:

Data availability
Look-ahead bias (especially receiving already compiled data which was revised and edited at later dates, e.g. downloading earnings report for Dec-2020, which was corrected in March-2021, or news sentiment data which is usually overly-engineered to make it look good). This inflates backtest results, since you’re using data which you couldn’t possibly have. You need point-in-time data, which is not cheap.
Data snooping
Engineering data into useful features, which feed into the model.
You have to make the features you are using in your model stationary, especially in financial data science side, such as price series, using: Fractional differentiation, differentiation.

Problem of models to use.
Trading strategy construction: After you receive the signals, or the output from the models, such as a direction prediction, or a magnitude prediction. You need to put the prediction into a coherent strategy.

Random forests is the best, since it has just the right complexity, and best captures non-linearity well. They are better than deep learning models (too complicated and data intensive), and better than linear regression models (high bias)
In ML prediction, do not predict the markets, rather, try predict whether your own strategy is going to make profit or anything else regarding your strategy. This technique is called [[meta-labelling]].

The three don’t’s of financial Machine Learning:

Deep Learning isn’t the solution to anything.
Don’t try stock price prediction.

-> In ML, we would want heteroskedasticity, i.e. error bars vary with the independent variables are desirable. -> Reinforcement learning: might work in shorter time scales, as people respond to order book changes.