Can Machines Replace Fund Managers?

25 Dec

In 2016, data surpassed oil as the most valuable commodity worldwide. Like oil, data needs to be refined and processed. Refineries turn oil into fuel. Machine learning ("ML") turns data into insight.

Machine learning is a sexy term synonymous with cutting-edge Silicon Valley technology. Whether or not the hype is real, one cannot deny ML has helped create incredible innovation, from self-driving cars to computer programs that beat chess grandmasters at their own game.

This hype has now extended to finance. Can ML predict stock returns? If so, will ML models replace fund managers? Could you use ML to improve your own personal investments?

We can dissect these questions with the help of a research paper from Yale University and AQR Capital Management (a US-based asset management firm), Can Machines ‘Learn’ Finance?.

What is Machine Learning?

Most of you have probably heard of Machine Learning before. But what is it actually?

At its core, ML is the process using data to build (‘train’) a model that can make predictions. Perhaps we want to predict whether a tumour is malignant or benign based on its characteristics? Or maybe we want to predict what will happen to the stock market in a week’s time?

So, how does it work? Let’s look at an example of ML in action.

Once upon a time, computer scientists were struggling to build a facial recognition program. They wanted it to be intuitive to humans. A program they tried analysed images, identified a person’s eyes and nose and assessed the relationship between the two features. However, human faces are infinitely variable, so this type of intuitive approach just didn’t work.

Surprisingly, the breakthrough came when scientists stopped all together thinking about what makes a face a face. They realised computers were far better at ‘understanding’ human faces and building accurate models than any computer scientist. So, they started working on programs that would take in large amounts of data (images of ‘faces’ and ‘not faces’) and dynamically learn how to distinguish between the two.

This realisation was their eureka moment.

The model the computer builds does not follow explicit instructions and is often barely understandable to a human observer. In fact, a big challenge in the field of ML is finding ways to reverse engineer ML models in order to understand the intuition behind said models’ deductions.

The idea of feeding data to a computer program and allowing it to work out an appropriate statistical model, rather than relying on human intuition to create the model, is the crux of ML.

This sort of ‘black box’ approach is enormously powerful and is currently the foundation on which many applications ranging from medicine to music are built. Any application you can think of that involves dealing with signals (images, speech, language, etc.) likely uses some form of ML.

Where does Machine Learning work best?

From IBM’s Deep Blue supercomputer beating a reigning world-champion to the development of self-driving cars, ML has many success stories.

But what do these stories have in common that allowed scientists and engineers to leverage ML so powerfully?

A ‘big data’ environment. ML requires an extremely large amount of data to accurately train a model. As we discuss in the next section, ‘large’ does not just mean a lot of different variables, but an abundance of data points describing each variable.
A high ‘signal-to-noise ratio’. When building a model, we characterise the data being used into two parts: ‘signal’ and ‘noise’. ‘Signal’ is some unknown pattern that we hope to capture and describe in our model. ‘Noise’ is everything else that gets in the way of that. Think randomness, recording errors or superfluous information. Data with a high ‘signal-to-noise’ ratio (more signal, less noise) is more predictable and suitable for effective ML.

The question is: do financial markets have these characteristics? But first,…

What is ‘big data’?

When people talk about big data in finance they usually forget to distinguish between where the ‘big’ comes from.

There can be a large number of predictor variables and there can be a large number of observations. Predictor variables are the different factors that can be used to model something, while observations are the number of data points for each predictor variable.

For example, let’s say we wanted to build a model to predict the amount of rainfall next week. The time of year, atmospheric temperature and what your neighbour is saying can be thought of as predictor variables. The number of data point measurements of rainfall, the time of year, temperature or neighbours opinion can be thought of as the observations.

So the question is: does the ‘big’ in ‘big data’ come from many predictor variables or many observations?

The answer is the latter. Because of this, the richness of a ML model is constrained by the number of observations the machine can learn from — not by the number of predictor variables.

If you don’t have much data on past rainfall, how do you expect to build a good rainfall model?

Predicting returns is actually a ‘small data’ problem

Accurately predicting returns is the holy grail of finance. So when the media incessantly quotes ‘big data in finance’ you’d think markets are a big data environment and therefore ML will succeed.

Not exactly.

These days, in finance, there are more predictor variables than we are even aware of. And as we discussed in our most recent article, ‘Space, Money and Baseball’, the rise of alternative data like satellite imagery and social media sentiment changed the game of return prediction.

However, as the paper points out, this doesn’t mean stock markets are a big data environment.

Financial data series don’t actually contain many observations. For any investor not micro trading, there are only a few decades worth of data and thus at best a few hundred observations per asset. Also, considering most financial data is time series (and not independently distributed) the relative information is lower.

The data is tiny.

But even for investors trading asset classes like derivatives on a daily basis, there are only a few hundred thousand observations per asset, which is still relatively small by ML standards.

So, what can fund managers do to overcome this small data problem? Well, this is an important question. One approach taken by investors is to adopt a high-frequency trading strategy so they can use more data (taken at more regular intervals). These are the big quant firms like Optiver and Citadel. But this, of course, requires deep technical expertise, brings higher trading costs and isn’t appropriate for every asset class. It’s almost a completely different skill-set.

There is also another problem here.

Some fields can generate new data through experimentation with relative ease. For example, Tesla could simply drive their car another 1000km to gather more observations. However, the only way to expand the size of one’s data series in financial return modelling is to wait for time to pass. This makes markets challenging for ML to succeed when you compare it to fields that can generate data through experimentation.

Even in 100 years, financial return prediction will still be a small data problem.

Markets have a low signal-to-noise ratio

If you think the small data problem was a big setback for ML, markets also have a low signal-to-noise ratio.

The main reason for this is that financial market behaviour is difficult to predict. Even the best stock or investment portfolio in the world can swing down because of unanticipated news, whether it’s COVID-19, an insider trading scandal, or the CEO getting a divorce.

What’s interesting is that the signal-to-noise ratio is constantly being pulled toward zero. Let’s see why.

If a trader could reliably predict a positive return — a strong signal — they would start buying. But prices will rise because they trade using their predictive information. That in turn sucks the predictability of stock price return out of the market. The predictability has been priced into the market.

The idea that competition in markets wipes out return predictability underpins the Nobel prize-winning work on the Efficient Market Hypothesis.

Now if you believe the market is efficient or not, that’s up to you. But if it is, (and the evidence points to this), the only thing that moves markets in the medium term is unanticipated news, shocks to the system or differences in beliefs of underlying value. Noise.

So at the end of the day, whatever predictability is left is small and hard to profit off. That’s why the signal-to-noise ratio is always pulled to zero.

If a machine can efficiently distinguish the signal (returns) from noise (unanticipated news or shocks) then accurately predicting returns using ML could be within reach.

But, because financial behaviour keeps pulling the signal-to-noise ratio toward zero, this becomes an incredibly difficult challenge…

The AQR paper raises another interesting point. The researchers argue that the more complex a ML model becomes (think of buzzwords like ‘neural networks’), the more likely its corresponding trading strategy is complex and often impractical.

Perhaps there isn’t enough liquidity, or trading costs are very high, or you simply can’t short sell. This is probably the reason why the predictability of returns was there in the first place.

This dilemma raises perhaps an even more interesting question: even if ML can unlock significant predictive insights in the world of finance, will these insights actually help generate better returns? Can they trade and capture them?

So, what’s the verdict?

Well, because of the structural differences between finance and fields where ML has had widespread success, there isn’t a consensus yet on whether ML will become a ubiquitous tool in finance.

There also isn’t much publically available academic research into the use of ML for finance. Perhaps there is some gatekeeping going on.

In saying that, there are a handful of ML-loving quant firms that have achieved great success. One such firm, Renaissance Technologies, has generated 66% annualised returns (on average) between 1988 and 2018.

Maybe the future of ML in predicting returns isn’t as bleak as the paper paints. In truth, it might just be a little harder than we first thought.