Intro into Machine Learning for Finance (Part 1)

There has been increasing talk in recent years about the application of machine learning for financial modeling and prediction. But is the hype justified? Is machine learning worth investing time and resources into mastering?

Photo by Franck V. on Unsplash

Photo by Franck V. on Unsplash

This series will be covering some of the design decisions and challenges to creating and training neural networks for use in finance, from simple predictive models to the use of ML to create specialised trading indicators and statistics — with example code and models along the way.

If you are comfortable with machine learning in general, please feel free to skip and read from the 3rd section “Where can it be applied in finance?

What is Machine Learning?

In simply terms, machine learning is about creating software which can be “trained” to automatically adapt its predictive model without the need for hard-coded changes. There is often debate whether machine learning is considered a subset of Artificial Intelligence, or whether AI is a subset of ML, but they both work to the same broad goal of pattern recognition and analysis.

While different forms of machine learning and expert systems have been around for decades, only relatively recently have we seen large advances in their learning capabilities as both training methods and computer hardware has advanced.

With the creation of easy to use open-source libraries, it has now become easier than ever to create, train and deploy models without the need for specialist education.

Neural Networks

Artificial Neural networks are, again, a subset of the broad field of machine learning. They are among the most commonly used and are easier to understand conceptually.

A network is made up of layers of “neurons”, which each perform a very simple calculation based on their own trained. Individually, they provide very little in terms of processing. However, when combined into a layer, and layers stacked into a full network, the complexity that the model can learn widens and deepens.

(Simple neural network structure)

(Simple neural network structure)

Each neuron has a weighting value associated with each input it receives. Its final output which is passes on is:

Sum of weightings * their respective input

This value is then put through an “activation function” (such as tanh or sigmoid).

The activation function can serve to normalize the output value of the neuron and add non-linearity, so that the network can learn functions more complex than simple linear relationships.

Once the input data has processed through the whole stack of neurons, you’ll be left with your simple prediction or statistics, such as a long/ short call for the next time period.

Training a network

Its all well and good looking at the flow of data for a model to make a prediction, but it wouldn’t be complete without a brief overview of how the network is actually trained to make these predictions.

During the training process, you run a set of data through the network to compare its predictions against the desired results for each data point. The difference between the output and the target value(s) is then used to update the weightings within the network through “back-propagation”.

Back-propagation starts at the output neurons, looking at the component values they received from the previous layer and the associated weightings. The weightings are given a small adjustment to bring the updated prediction closer inline with the desired output.

This error between the output and target is then fed to the next layer down were the same updating process is repeated until all the network weightings have been marginally adjusted.

This is repeated multiple times over every data-point in the training dataset, giving the network weightings a small adjustment each time until predictions converge towards the target outputs.

In theory, this learned model will then be able to make accurate predictions on out of sample data. However, it is very easy to over-fit to the training data if the model is too large and simply learns the inputs rather than a generalized representation.

(Example of over-fitting for simple classification — made with  tensorflow playground )

(Example of over-fitting for simple classification — made with tensorflow playground)

Where can it be applied in finance?

Since neural networks can be used to learn complex patterns in a dataset, they can be used to automate some of the processes of technical analysis commonly used by traders.

A moving average cross strategy can be coded with ease, needing only a few lines for a simple trading bot. However, more complex patterns such as indicator divergence, flags and wedges, and support/ resistance levels can be harder to identify with simple rules. And, indeed, forming a set of chart patterns into an objective trading strategy is often hard to achieve.

Machine learning can be applied in several different cases for this one scenario.

  1. Pattern recognition from candle data to identify levels of significance
  2. Creating specialised indicators to add to a simple rule based strategy
  3. A final processing and aggregation layer to make a prediction from your set of indicators.

Machine learning can also be applied in slightly more exotic ways to help refine further information:

  • Denoising and auto-encoding — used to remove some of the random noise of a price feed to help distill the underlying trend or specifics of the market sentiment.
  • Clustering — group together different equities and financial instruments to streamline the value of a portfolio. Or it could be used to evaluate and reduce the risk of a portfolio.
  • Regression — often used to try to predict the price at the next time step, however it can be applied to a range of abstracted indicators to help predict trading signals earlier.

Machine Learning vs Traditional Methods

In many of the cases above, it is perfectly possible (and often advisable) to stick to more traditional algorithms. A well made machine learning framework has the advantage when it comes it easy retraining, but at the cost of complexity, computational overhead and interpretability.

While there have been advances in the use of relevance heat-maps to help explain the source of a prediction, neural nets still mostly remain black boxes — ruling out certain use cases, such as for fund managers, where decision justification and accountability is of importance to clients.

We may well see attitudes change over time as ML assisted trading and investing becomes wider spread, but for now this remains a large obstacle to practical application in certain settings.

Furthermore, it is often a lot easier to make a simple rule based strategy over a full ML model and training structure. But when done right, machine learning can provide cutting edge accuracy to the adversarial world of financial trading.

Conclusion

Despite some of the added challenges and complexity brought by the addition of machine learning, it provides a new range of tools which can be applied to a range of problems in finance, allowing for greater automation and accuracy.

In the next post we’ll be looking deeper into some of the theory and decision making behind different training methods and tasks for a new model.

By Matthew Tweed

/

9 Most Commonly Asked Questions About MarketStore And Answers To Them

Photo by  William Stitt  on  Unsplash

Each of these articles seeks to explain the technology we build along with our Alpaca algo trading brokerage. These articles led active discussions on Reddit and Medium, and it became clear to us that there is a lot of interest and a pretty large need in the community for a timeseries database dedicated for financial market data. The database world and software engineering in general have changed so much over the last decade as we’ve seen an explosion in open source programming and databases. We are seeing some people now actively using open source to and contributing some code in the GitHub repository.

In social media and offline, we’ve been answering questions and responding to comments, but today we wanted to take the opportunity to put all the queries and responses together in one post and share it with the entire community so everyone can get a look at the responses on a single post.

Q: Does MarketStore store data in memory?

A: No. MarketStore is designed to run in a reasonable size of host without huge hardware investment. If you have lots of cash, software technology is irrelevant, but what software engineering can bring is that you can do a lot better job with cheaper hardware. MarketStore’s primary use case is to be able to store and distribute years of data at second level granularity for more than tens of thousands of series (US equities and crypto coins across exchanges can easily become this size). The data size can be a few terabytes, and it is not still very common to have this big size RAM in a commodity hardware. MarketStore instead stores everything in disk, but the on-disk format is nearly identical to the layout in the memory, and thanks to SSD evolution, MarketStore can load the data at the speed competitive to in-memory storage.

Q: How does it make sense to compare with PostgreSQL and includes DataFrame loading?

A: Even if you can store the data, offloading it from application processes, it is not useful if you cannot use it. MarketStore is mainly used in the context of AI machine learning and backtesting, and the application typically loads it into some tabular structure such as Pandas DataFrame. That is why MarketStore’s network protocol is byte sequence in MessagePack so the inefficient JSON deserialization can be avoided. The client can load the delivered byte data into memory as C array, which is what is used behind DataFrame.

Q: How is it better compared to InfluxDB?

A: We have not compared the performance with InfluxDB, but InfluxDB and other general-purpose timeseries databases use-case is as system metrics or activity log analysis. Those require more flexible data structure and don’t necessarily need specific functions such as timezone-aware aggregate. The flexibility comes with necessary overhead as tradeoffs as always, and MarketStore should be much faster and cost effective if the use case is the financial market data.

Q: Why are you comparing with PostgreSQL when Timescale should be faster?

A: You can send us the benchmark results if you have them, but in our internal experiments, Timescale is even slower than PostgreSQL compared to MarketStore. The loading time at the database server level for Timescale is 2–3x slower than PostgreSQL, since Timescale makes use of table partitioning (aka table constraints exclusion) that needs to open lots of files from disk. It will give advantage to filter a small slice of the data out of large amount of data, but it will not work better if you scan most of it. MarketStore stores the data in an optimal way on disk and reads sequentially direct to memory compared to those relational databases, so it is way faster.

Q: MarketStore can be used only for historical data but not for real-time data right?

A: There is a new feature coming soon to MarketStore that will allow streaming and realtime push on every new data write. MarketStore was originally designed to help our algo trading platform that builds trading algorithms using deep learning, and run them in the real market, and had JSON websocket streaming. The feature has been for the time being so that Marketstore can find a way to fit in larger use cases. But thankfully it is now back in as a plugin. We have been testing this with thousands of updates every few seconds and so far it is working perfectly.

Q: Why do I need this for machine learning? I can load the data from disk without a problem

A: If your training process doesn’t use much data (e.g. just daily bars from one stock), then yes probably you don’t need MarketStore for performance reasons. What we needed to do on Alpaca trading platform requires a server that is large enough to store an amount of intraday data across the entire market (can be up to terabyte range), and load the necessary series data back and forth. If you are familiar with typical machine learning training process, you can tell how the training iteration can load random data from the pool. That said, MarketStore is not just for performance, but also for the convenience to prove the uniformed way to access historical and real-time timeseries data the same way without worrying about how to manage local files etc. And the built-in data ingestor can load the data without even writing any code.

Q: Where is the installer?

A: Sorry, at the moment, we are not providing the one-click installer! But instead, we package the server process into a docker container image, so if you have docker, you can just start it in a second.

Q: Why is it open sourced?

A: Because there is a problem to be solved! MarketStore was implemented proprietary for our internal use and has been used in our production, but we have also seen the common problems affecting many people in the space. Our mission at Alpaca is to help individual investors with technology, and improve the algo trading environment, regardless of whether we give that information away to users or offer it in a premium package. This kind of product has only been accessible by financial institutions with large capital resources. But now we are making it available to anyone who is eager to try out! That’s awesome, isn’t it!?

Q: I found a bug

A: Please report it in the GitHub issue!

/

Algo Trading News Headlines 5/30/2018

Goldman: Machines Are Taking Over Markets

(www.nasdaq.com)

“Liquidity is the new leverage”: That was the ominous warning fired by Goldman Sachs’ head of Global Credit Strategy, Charles Himmelberg, admonishing traders about the dangers of the ongoing algorithmic transformation in the markets, including the toxic combination of Quant Funds and High-Frequency Trading and how they put the bond and equities markets at high risk of a systemic event.

From  Bloomberg

Bloomberg launches market forecasting application powered by artificial intelligence

(www.bloomberg.com)

Bloomberg today announced the launch of a new price forecasting application for investment professionals powered by artificial intelligence (AI). The “Alpaca Forecast AI Prediction Matrix” is an application (app) that provides short-term market price forecasts for major markets such as USD/JPY, EUR/USD, AUD/JPY, CME Nikkei 225 Futures Index and US 10-year treasury bonds, using Bloomberg’s Market Data Feed (B-PIPE).

 

Co-location case: CBI books stock broker & NSE, NIPFP, Sebi officials

(economictimes.indiatimes.com)

In this architecture, data was disseminated in a sequential manner whereby a stock broker who connected first to the server of stock exchange received tick, that is market feed before the stockbroker who connected later.

 

Capital Markets Impact of Regulatory Reform Legislation

(www.lexology.com)

Section 502 of the Act requires the SEC to submit a study on algorithmic trading to committees of the Senate and the House of Representatives, reporting on the risks and benefits of algorithmic trading in the capital markets in the United States.

 

Algo-trading is a threat grocery has to take seriously

(www.thegrocer.co.uk)

Algo-trading has the potential to deliver particularly bad news for the coffee industry, as volatile intra-day trading would likely catch producers, roasters and retailers off guard with sudden price changes. Ultimately, this may lead to increased price volatility for the consumer.

 

Fintech firm Alpaca launches “AlpacaForecast AI Prediction Matrix” for Bloomberg users

(financefeeds.com)

The “AlpacaForecast AI Prediction Matrix” is an application that utilizes Alpaca’s large-scale data processing technology and deep learning technology and shows real-time short-term forecasts for major markets. The company has decided to develop this application in hope that it would bring advanced AI market forecasting capabilities to the global financial community, right to their desks.

/