Comparing 3 Different Types of Neural Network Architectures in Finance - Intro into Machine Learning for Finance (Part 3)

When working on a machine learning task, the network architecture and the training method are the two key factors to turning a set of data-points into a functional model.

But where should different training methods be applied? How do they work? And which is “best”? In this post, we list up three types of training methods and make comparisons among Supervised, Unsupervised and Reinforcement Learning.

Photo by  David Wright  on  Unsplash

Photo by David Wright on Unsplash

1. Supervised learning

In “Intro into Machine Learning for Finance (Part 1)” we covered some high level theory on how a network is trained to improve the accuracy of its model, but we never discussed where the target values for comparison actually come from.

Classification Task

For a classification task, its easy to see. You wish for the model to be able to identify a falling wedge pattern, for example, so you feed it sets of inputs, each labeled as either being a falling wedge or not. If the model mislabels an input the error is back-propagated to change its prediction for the future.

Regression Task

Similarly, for a regression task, the label for each set of inputs will likely be the next value in the time series. For example, you might try to make a model which can learn to predict the price at the close of the next day based on a set of previous market movements.

Both of these cases are examples of “supervised learning” where the model is trained against an already labeled set of data and the error function is calculated as the difference between the predicted output and the supervised labels for the dataset.

This is generally a very simple and efficient process, as the network weightings are updated to minimise the error between the prediction and the target output for each batch of data-points. You already have the target function/ decision process to label the data, its just a case of fitting the model to try to emulate it.

In “Forecasting Market Movements Using Tensorflow — Intro into Machine Learning for Finance (Part 2)” we put supervised learning into practice with a simple neural network to make long/short calls.

2. Unsupervised learning

Meanwhile, unsupervised learning tasks revolve around the model learning complex relationships within data that you haven’t been able to determine yet. This can be through tasks such as clustering data-points, which help to give insight to the structure of the data.

The application to real life and, indeed, trading is often harder to see, which is exactly the reason it can’t be a “supervised” task — it is trying to find relationships in the data that we haven’t found yet.

One good use may be in the analysis of portfolios. By clustering equities and financial instruments you can get a unique view of the distribution of exposure and risk, and either hedge accordingly or look to maximise the efficiency of exposure to one area of the market.

An interesting paper on the creation of diverse portfolios via clustered stocks can be found here.

3. Reinforcement learning

Reinforcement learning is an interesting mix of both supervised and unsupervised learning. While it does require a specific target function to be trained towards, the error that trains the network is deferred from the actual decision making period. The network is instead trained against a “reward” and/or “punishment” function.

The model is attempting to learn a policy of actions to be able to maximise its reward function, such as learning the optimal time to hedge a portfolio in a turbulent market.

There is no immediate profit or loss at the point when the decision is made. Instead, its reward function will be based on how successful the hedging was in protecting portfolio value in coming time-steps. If it were to hedge too early, it could miss out on market upside. Where as, if it hedged too late then the portfolio will suffer greater drawdown.

Since RL is set up to generate actions for an environment, rather than to output a simple prediction, it requires a simulated training environment for the agent to react to and interact with. This can prove challenging both in terms of basic implementation and especially in optimizing to train in any reasonable time frame.

Again, further reading on reinforcement learning for portfolio hedging can be found here.

Comparison and Drawbacks

Dataset generation:

A supervised learning task, such as classification, requires a set of label data to train against. For a simple price predictor this will only require a small pre-processing script which sets the target value as the close price the next day. However, for more complex functions it will require a much more complex algorithm, or even manual pattern identification and labeling. And, due to the large data-sets needed for effective training (10s of thousands at minimum), this can be extremely time consuming, if not infeasible.

Unsupervised learning, on the other hand, has no such issue — as its trying to find relationships in unlabeled data. However, you still need to gather the dataset for use, verify the data is clean enough and interpret the results of the model relative to the data.

Reinforcement learning is similar to supervised, as you need a reward function for the model to train towards. However, instead of a labeled dataset, per se, the model is trained against a simulated environment. This can allow for a simpler function to identify the behavior to be rewarded, but brings added complexity to the setup of the data feeds and how they interact with the model. RL also has the added issue of data requirements — needing huge datasets to train effectively.

Training times:

While both supervised and unsupervised models can require significant time and resources to train adequately, they pale in comparison to reinforcement learning due to the nature of its deferred rewards when training. So, not only will you need a larger dataset to train against for RL, but you will need more powerful machines to run the training process in a comparable time.

Meanwhile, it’s hard to contrast the training process of supervised vs unsupervised methods, as both the time per training step and the number of training steps required will vary greatly depending on the size of the network used and the optimizer/ optimizer settings.

In all cases, it’s advisable to look at using newer optimizers, such as Adam Optimizer, as they can provide faster and less noisy training for the network, achieving a more accurate fit over fewer epochs.


Overfitting is a serious concern for all training types and, once again, varies more by the quality of data and architecture than type of training method.

A rule of thumb to avoid overfitting: once you’ve found an architecture than can learn to accurately predict training data, if validation accuracies diverge during training then slowly reduce the size of the network until you find the smallest network that still trains to a good accuracy on the training dataset.

If the accuracy against the validation dataset still fails to converge then your issue likely lies with lack of causation between training features and outcome, rather than overfitting of the model.


Each training method is suited to a specific type of machine learning task and data, with supervised being the most likely candidate to create simple trading signals and unsupervised being used for analysing relationships in data to help refine strategies.

While reinforcement learning has great promise in certain tasks, it is unlikely to be particularly feasible vs other machine learning techniques due to huge data and computing requirements.

By Matthew Tweed


Forecasting Market Movements Using Tensorflow - Intro into Machine Learning for Finance (Part 2)

Multi-Layer Perceptron for Classification

Is it possible to create a neural network for predicting daily market movements from a set of standard trading indicators?

In this post we’ll be looking at a simple model using Tensorflow to create a framework for testing and development, along with some preliminary results and suggested improvements.

Photo by  jesse orrico  on  Unsplash

Photo by jesse orrico on Unsplash

The ML Task and Input Features

To keep the basic design simple, it’s setup for a binary classification task, predicting whether the next day’s close is going to be higher or lower than the current, corresponding to a prediction to either go long or short for the next time period. In reality, this could be applied to a bot which calculates and executes a set of positions at the start of a trading day to capture the day’s movement.

The model is currently using 4 input features (again, for simplicity): 15 + 50 day RSI and 14 day Stochastic K and D.

These were chosen due to the indicators being normalized between 0 and 100, meaning that the underlying price of the asset is of no concern to the model, allowing for greater generalization.

While it would be possible to train the model against any number of other trading indicators or otherwise, I’d recommend sticking to those that are either normalized by design or could be modified to be price or volatility normalized. Otherwise a single model is unlikely to work on a range of stocks.

Dataset Generation

(Code Snippet of a dataset generation example — full script at end of this post)

(Code Snippet of a dataset generation example — full script at end of this post)

The dataset generation and neural network scripts have been split into two distinct modules to allow for both easier modification, and the ability to re-generate the full datasets only when necessary — as it takes a long time.

Currently the generator script is setup with a list of S&P 500 stocks to download daily candles since 2015 and process them into the required trading indicators, which will be used as the input features of the model.

Everything is then split into a set of training data (Jan 2015 — June 2017) and evaluation data (June 2017 — June 2018) and written as CSVs to “train” and “eval” folders in the directory that the script was run.

These files can then be read on demand by the ML script to train and evaluate the model without the need to re-download and process any more data.

Model Training

(Code Snippet of model training — full script at end of this post)

(Code Snippet of model training — full script at end of this post)

At start-up, the script reads all the CSV files in the “train” and “eval” folders into arrays of data for use throughout the training process. With such a small dataset, the RAM requirements will be low enough not to warrant extra complexity. But, for a significantly larger dataset, this would have to be updated to only read a sample of the full data at a time, rotating the data held in memory every few thousand training steps. This would, however, come at the cost of greater disk IO, slowing down training.

The neural network itself is also extremely small, as testing showed that with larger networks, evaluation accuracies tended to diverge quickly.


The network “long Output” and “short Output” are used as a binary predictor, with the highest confidence value being used as the model prediction for the coming day.

The “dense” layers within the architecture mean that each neuron is connected to the outputs of all the neurons in the layer below. These neurons are the same as described in “Intro into Machine Learning for Finance (Part 1)”, and use tanh as the activation function, which is a common choice for a small neural network.

Some types of data and networks can work better with different activation functions, such RELU or ELU for deeper networks. RELU (Rectifier Linear Unit) attempts to solve the vanishing gradient problem in deeper architectures, and the ELU is a variation on this to make training yet more efficient.


As well as displaying prediction accuracy stats in the terminal every 1000 training steps, the ML script is also setup to record summaries for use with TensorBoard — making graphing of the training process much easier.

While I haven’t included anything other than scalar summaries, it’s possible to record everything from histograms of the node weightings to sample images or audio from the training data.

To use TensorBoard with the saved summaries, simply set the — logdir flag to directory you’re running the ML script in. You then open the browser of your choice and enter “localhost:6006” into the search bar. All being well, you now have a set of auto-updating charts.

Training results

Node layouts: Model 1 (40,30,20,10), Model 2 (80,60,40,20), Model 3 (160,120,80,40)

Node layouts: Model 1 (40,30,20,10), Model 2 (80,60,40,20), Model 3 (160,120,80,40)

The results were, as expected, less than spectacular due to the simplicity of the example design and its input features.

We can see clear overfitting, as the loss/ error increases against the evaluation dataset for all tests, especially so on the larger networks. This means that the network is only learning the pattern of the specific training samples, rather than an a more generalized model. On top of this, the training accuracies aren’t amazingly high — only achieving a few percent above completely random guesses.

Suggestions for Modification and Improvement

The example code provides a nice model that can be played around with to help understand how everything works — but it serves more as a starting framework than a working model for prediction. As such, a few suggestions for improvements that you might want to make and ideas you could test

Input features

In its current state, the dataset is generated with only 4 input features and the model only looks at one point in time. This severely limits what you can expect it to be able to learn — would you be able to trade only looking at a few indicator values for one day in isolation?

First, modifying the dataset generation script to calculate more trading indicators and save them to the CSV. TA-lib has a wide range of functions which can be found here.

I recommend sticking to normalized indicators, similar to Stoch and RSI, as this takes the relative price of the asset out of the equation, so that the model can be generalized across a range of stocks rather than needing a different model for each.

Next, you could modify the ML script to read the last 10 data periods as the input at each time step, rather than just the one. This allows it to start learning more complex convergence and divergence patterns in the oscillators over time.

Network Architecture

As mentioned earlier, the network is tiny due to the lack of data and feature complexity of the example task. This will have to be altered to accommodate the extra data being fed by the added indicators.

The easiest way to do this would be to change the node layout variable to add extra layers or greater numbers of neurons per layer. You may also wish to experiment with different types of layer other than fully connected. Convolutional layers are often used for pattern recognition tasks with images, so could be interesting to test out on financial chart data.

Dataset labels

The dataset is labeled at “long” if price difference is >=0, otherwise “short”. However, you may wish to change the threshold to be equal to the median price change over the length of the data, to give a more balanced set of training data.

You may even wish to add a third category of “neutral” for days where the price stays within a limited range.

On top of this, the script also has the ability to vary the look ahead period for the increase or decrease in price. So it could be tested with a longer term prediction.


With the implementation of the suggested improvements, it is certainly possible to improve on the model to the point where it could be used as a complimentary trading indicator to a standard rule based strategy.

However, expectations should be tempered when it comes to such a simple architecture and training task. Machine learning can really set itself apart with a more refined network structure and prediction task.

As such, in the next article we’ll be looking at Supervised, Unsupervised and Reinforcement Learning, and how they can be used to create time series predictor and to analyze relationships in data to help refine strategies.

Full Script

By Matthew Tweed


Intro into Machine Learning for Finance (Part 1)

There has been increasing talk in recent years about the application of machine learning for financial modeling and prediction. But is the hype justified? Is machine learning worth investing time and resources into mastering?

Photo by Franck V. on Unsplash

Photo by Franck V. on Unsplash

This series will be covering some of the design decisions and challenges to creating and training neural networks for use in finance, from simple predictive models to the use of ML to create specialised trading indicators and statistics — with example code and models along the way.

If you are comfortable with machine learning in general, please feel free to skip and read from the 3rd section “Where can it be applied in finance?

What is Machine Learning?

In simply terms, machine learning is about creating software which can be “trained” to automatically adapt its predictive model without the need for hard-coded changes. There is often debate whether machine learning is considered a subset of Artificial Intelligence, or whether AI is a subset of ML, but they both work to the same broad goal of pattern recognition and analysis.

While different forms of machine learning and expert systems have been around for decades, only relatively recently have we seen large advances in their learning capabilities as both training methods and computer hardware has advanced.

With the creation of easy to use open-source libraries, it has now become easier than ever to create, train and deploy models without the need for specialist education.

Neural Networks

Artificial Neural networks are, again, a subset of the broad field of machine learning. They are among the most commonly used and are easier to understand conceptually.

A network is made up of layers of “neurons”, which each perform a very simple calculation based on their own trained. Individually, they provide very little in terms of processing. However, when combined into a layer, and layers stacked into a full network, the complexity that the model can learn widens and deepens.

(Simple neural network structure)

(Simple neural network structure)

Each neuron has a weighting value associated with each input it receives. Its final output which is passes on is:

Sum of weightings * their respective input

This value is then put through an “activation function” (such as tanh or sigmoid).

The activation function can serve to normalize the output value of the neuron and add non-linearity, so that the network can learn functions more complex than simple linear relationships.

Once the input data has processed through the whole stack of neurons, you’ll be left with your simple prediction or statistics, such as a long/ short call for the next time period.

Training a network

Its all well and good looking at the flow of data for a model to make a prediction, but it wouldn’t be complete without a brief overview of how the network is actually trained to make these predictions.

During the training process, you run a set of data through the network to compare its predictions against the desired results for each data point. The difference between the output and the target value(s) is then used to update the weightings within the network through “back-propagation”.

Back-propagation starts at the output neurons, looking at the component values they received from the previous layer and the associated weightings. The weightings are given a small adjustment to bring the updated prediction closer inline with the desired output.

This error between the output and target is then fed to the next layer down were the same updating process is repeated until all the network weightings have been marginally adjusted.

This is repeated multiple times over every data-point in the training dataset, giving the network weightings a small adjustment each time until predictions converge towards the target outputs.

In theory, this learned model will then be able to make accurate predictions on out of sample data. However, it is very easy to over-fit to the training data if the model is too large and simply learns the inputs rather than a generalized representation.

(Example of over-fitting for simple classification — made with  tensorflow playground )

(Example of over-fitting for simple classification — made with tensorflow playground)

Where can it be applied in finance?

Since neural networks can be used to learn complex patterns in a dataset, they can be used to automate some of the processes of technical analysis commonly used by traders.

A moving average cross strategy can be coded with ease, needing only a few lines for a simple trading bot. However, more complex patterns such as indicator divergence, flags and wedges, and support/ resistance levels can be harder to identify with simple rules. And, indeed, forming a set of chart patterns into an objective trading strategy is often hard to achieve.

Machine learning can be applied in several different cases for this one scenario.

  1. Pattern recognition from candle data to identify levels of significance
  2. Creating specialised indicators to add to a simple rule based strategy
  3. A final processing and aggregation layer to make a prediction from your set of indicators.

Machine learning can also be applied in slightly more exotic ways to help refine further information:

  • Denoising and auto-encoding — used to remove some of the random noise of a price feed to help distill the underlying trend or specifics of the market sentiment.
  • Clustering — group together different equities and financial instruments to streamline the value of a portfolio. Or it could be used to evaluate and reduce the risk of a portfolio.
  • Regression — often used to try to predict the price at the next time step, however it can be applied to a range of abstracted indicators to help predict trading signals earlier.

Machine Learning vs Traditional Methods

In many of the cases above, it is perfectly possible (and often advisable) to stick to more traditional algorithms. A well made machine learning framework has the advantage when it comes it easy retraining, but at the cost of complexity, computational overhead and interpretability.

While there have been advances in the use of relevance heat-maps to help explain the source of a prediction, neural nets still mostly remain black boxes — ruling out certain use cases, such as for fund managers, where decision justification and accountability is of importance to clients.

We may well see attitudes change over time as ML assisted trading and investing becomes wider spread, but for now this remains a large obstacle to practical application in certain settings.

Furthermore, it is often a lot easier to make a simple rule based strategy over a full ML model and training structure. But when done right, machine learning can provide cutting edge accuracy to the adversarial world of financial trading.


Despite some of the added challenges and complexity brought by the addition of machine learning, it provides a new range of tools which can be applied to a range of problems in finance, allowing for greater automation and accuracy.

In the next post we’ll be looking deeper into some of the theory and decision making behind different training methods and tasks for a new model.

By Matthew Tweed