Ultimate List of Automated Trading Strategies You Should Know - Part 1

This is the part 1 of a series “Ultimate List of Automated Trading Strategies

 Photo by  Artem Bali  on  Unsplash

Photo by Artem Bali on Unsplash

So many types of automated trading use-cases

Since the public release of Alpaca’s commission-free trading API, many developers and tech-savvy people have joined our community slack to discuss various aspects of automated trading. We are excited to see many have already started running algorithms in production, while others are testing their algorithms with our paper trading feature, which allows users to play with our API in a real-time simulation environment.

When we started thinking about a trading API service earlier this year, we were looking at only a small segment of algo trading. However, the more users we talked with, the more we realized there are many use cases for automated trading, particularly when considering different time horizons, tools, and objectives.

Today, as a celebration of our public launch and as a welcome message to our new users, we would like to highlight various automated trading strategies to provide you with ideas and opportunities you can explore for your own needs.

Please note that some concepts overlap with others, and not every item necessarily talks about a specific strategy per se, and some of the strategies may not be applicable to the current Alpaca offering.

(1) Time-Series Momentum/Mean Reversion

Background

(Time-series) momentum and mean reversion are two of the most well known and well-researched concepts in trading. Billions of dollars are put to work by CTAs employing these concepts to produce alpha and create diversified return streams.

What It Is

The fundamental idea of time-series forecasting is to predict future values based on previously observed values. Time-series momentum, also known as trend-following, seeks to generate excess returns through an expectation that the future price return of an asset will be in the same direction as that asset’s return over some lookback period.

Trend-following strategies might define and look for specific price actions, such as range breakouts, volatility jumps, and volume profile skews, or attempt to define a trend based on a moving average that smooths past price movements. One of the simple, well-known strategies is the “simple moving average crossover”, which buys a stock if its short-period moving average value surpasses its long-period moving average value, and sells if the inverse event happens.

Mean-reversion is the expectation that the future price return of an asset will be in the opposite direction of that asset’s return over some lookback period. One of the most popular indicators is the Relative Strength Index, or RSI, which measures the speed and change of price movements using a scale of 0 to 100. For the purposes of trying to assess the likelihood of mean-reversion, a higher RSI value is said to indicate an overbought asset while a lower RSI value is said to indicate an oversold asset.

For Implementation

Trend-following and mean-reversion strategies are easy to understand since they look at a single asset’s time-series and try to make a prediction about that asset’s future return, but there are many ways to interpret the past behavior. You will need access to historical price data and may benefit from an indicator calculator library such as TA-lib. Virtually every trading framework library, including pyalgotrade, backtrader, and pylivetrader, can support these types of strategies.

Here is the Quantopian tutorial with backtest result for moving average crossover: Quantopian Tutorials

(2) Cross-Sectional Momentum/Mean Reversion

Background

In the U.S. stock market, there are more than 6,000 names listed on the exchanges and actively traded every day. One of the hardest problems in stock trading (and also true for global cryptocurrency trading) is how to pick the stocks.

What It Is

Cross-sectional momentum compares the momentum metrics across different stocks to try to predict the future returns of one or more of them. Even if two stocks such as Facebook and Google are indicating a momentum breakout, this may be driven by the market, but you try to beat the market by taking stronger momentum between those signals. Same for mean reversion. The point is that we consider the market movement that drives each individual stock and consider the relative strength of signals across stocks in an effort to produce a strategy that will outperform the market. This tends to be more computationally heavy, since you need to calculate the metrics with potentially tens to hundreds of time-series.

For Implementation

Again, for this type of strategy libraries like TA-Lib may make it easier to calculate the indicators. Also, you may need simultaneous access to multiple symbols’ price data. IEX’s API can provide up daily bar data for up to 100 stocks per query.

A medium post about cross-sectional study: Basics of Backtest and Cross-sectional Momentum

(3) Dollar Cost Averaging

Background

This is one of the simplest automated trading strategies and it is widely used by many investors.

What It Is

The idea is to invest a fixed amount of money into an asset periodically. You may doubt it, but some research indicates that this works in the real world, especially long-term. The logic behind it is that price fluctuates many times, and you may buy the stock cheaper overall compared to just investing in the stock at one point in time.

Remember, all of you who contribute to your 401k account are basically doing this. However, you might never think about doing it yourself, simply because there has been no easy way to automate this process.

For Implementation

Now with Alpaca trading API, it’s much simpler and provides much more flexibility.

(4) Market Making

Background

Market makers are important intermediaries who stand ready to buy and sell securities continuously. By doing this, they provide much-needed liquidity and are compensated for their inventory risk primarily by capturing bid-ask spreads.

Market making used to be done primarily by humans, who worked as floor traders in the pits, but now it’s almost entirely performed by machines. As exchanges have become more and more electronic, the strategy market makers employ has naturally required automation.

What It Is

There are a variety of approaches to market making but most typically rely upon successful inventory management through hedging and limiting adverse selection.

Some market makers may have very tight exposure limits and seek to turn over their positions quickly with the goal of being flat at the end of each day. Others may operate on a much longer horizon, carrying a large and diverse portfolio of securities long and short indefinitely. Undoubtedly, for any market maker, speed helps. The speed of calculation allows the market maker to continuously update its pricing and portfolio risk models, while the speed of execution allows the market maker to act on its models in a timely manner in an effort to reduce adverse selection and get better pricing on its hedges.

Competitive market makers need high-resolution data and a low latency infrastructure, although typically the longer their trading horizon is, the less sensitive they are to these things, and a smart but slow model goes a long way.

For Implementation

Also, in order to process vast amounts of data quickly and handle concurrency, languages like python may not be suitable. Go/Rust would be a good choice for balance between ease of concurrency handling and processing speed, as well as functional languages like Erlang/OCaml or good old languages like C++.

Some high-level explanation of market making: How profitable is market making on different exchanges

(5) Day Trading Automation

Background

Lots of day traders develop their trading strategies based on a mechanical set of conditions that are first based on intuition. Since manual day trading involves continuously assessing market conditions and making discretionary trading decisions on the spot, it can often be very physically and emotionally draining. Because the strategies are based on some rules or heuristics which can be codified, it is natural to think they can be automated, which is likely the case.

What It Is

One of the very well-known day trading strategies is the gap-up momentum strategy.

Suppose between the previous market close and next market open there is a positive earnings report. The market opens with a big gap, drawing lots of traders’ attention, and the price keeps going up for a while in the morning (but may not continue for long).

This strategy seeks to capture this follow-through momentum. The challenge here is that not all gap-up stocks keep going up, and among a handful of screened stocks, you need to watch each one’s price action simultaneously.

Some traders may enter on a price breakout from a certain price resistance level, while others may wait to see a chart pattern form to determine the first bottom before going higher. Day trading often relies on analyzing the stock’s price chart and fine-tuning the algorithm to capture the price action can be tricky. That said, once it’s well developed, you are letting your bot trade on your behalf as if you were trading manually, and now you don’t need to monitor the markets and you can also monitor more stocks at the same time without any emotions affecting your trade execution, which is very compelling.

For Implementation

The main thing you need for this is access to market data. You may not even need indicator calculations but instead, you may need a stock screening library such as pipeline-live. The latency typically isn’t so important, so you don’t need to write your system in C++. Python, as well as other lightweight languages, are likely sufficient.

Some reference: Momentum Day Trading Strategies for Beginners: A Step by Step Guide

To Be Continued…

This is part 1 of 3 posts to overview the various types of automated trading strategies. Stay tuned for our next post to cover more.

/

Python Library To Run Quantopian Algorithm In Live

Quantopian — The Online Algo Trading Platform

Quantopian is one of the most popular online algo trading platforms and communities today. It provides the great backtesting environment where you can experiment with your idea, build algorithms and even participate in the contest, as well as share the idea and discuss it with smart people there.

 Photo by  Rodion Kutsaev  on  Unsplash

One of the things many people have asked Alpaca during the beta program is how to run the algorithms that they built in Quantopian platform for their own purpose, not just for the contest. While Quantopian has built so much in the platform, they are so great to share the internal framework as open source zipline.

The Newest Open Source Libraries for Quantopian Users

Today, I wanted to share our newest open source libraries for Quantopian users; pylivetrader and pipeline-live.

alpacahq/pylivetrader
Python live trade execution library with zipline interface. - alpacahq/pylivetradergithub.com

alpacahq/pipeline-live
Pipeline Extension for Live Trading. Contribute to alpacahq/pipeline-live development by creating an account on GitHub.github.com

pylivetrader is a zipline API compatible trading framework in python which again focuses on live trading, with much less overhead and dependency problems. It is written from the ground up for live trading use cases, so it removes a lot of heavy lifting that zipline had to do such as price adjustment etc.

This means, you don’t need to build your data bundle to kick off your algorithm in live, but instead you can just start your live trading from the Quantopian algorithm source right away.

At the moment, the supported backend is only Alpaca, but we are happy to connect to IB etc. if someone contributes the code.

Pipeline API — the Core Piece of Quantopian Framework

Pipeline API is the core piece of Quantopian algorithm framework that allows easy stock selection based on the different metrics, much in a pythonic way, and this differentiates the platform from others. I found Pipeline is providing a tremendous value when it comes to trading wide range of universe. Unfortunately, it is not so easy for most people to use this great feature outside of the Quantopian platform.

pipeline-live is a python tool that allows you to do something similar anywhere so that you can do your research somewhere else as well as use it with existing python trading framework such as zipline-live or backtrader, including pylivetrader which I am introducing below. pipeline-live primarily uses IEX public API for pricing and basic fundamental information.

As you know, IEX provides market-wide volume data for daily OHLCV which makes it a perfect choice for pipeline usage. Since pipeline-live focuses on live trading use cases, it does not provide historical view unlike inside Quantopian, but the upside is it is fairly independent and easy to use. It is also very extensible so you can hook up with other paid data sources if you would find useful.

How to Convert Your Quantopian Algorithms to Run in Live Trading

We also put some practices together about how you could convert your Quantopian algorithms to run in live trading. You may want to take a look at these documents if you are interested in.

https://github.com/alpacahq/pipeline-live/blob/master/migration.md
https://github.com/alpacahq/pylivetrader/blob/master/migration.md

I also posted in Quantopian forum with the real example, and you may take a look at it, too.

Long-only non-day trading algorithm for live
This is a modified version of the algorithm presented in…www.quantopian.com

Feel free to give me any feedback/questions/criticism. Happy to help you get started with live trading with these tools too.

And here is the example code migrated from the post above.


/

Comparing 3 Different Types of Neural Network Architectures in Finance - Intro into Machine Learning for Finance (Part 3)

When working on a machine learning task, the network architecture and the training method are the two key factors to turning a set of data-points into a functional model.

But where should different training methods be applied? How do they work? And which is “best”? In this post, we list up three types of training methods and make comparisons among Supervised, Unsupervised and Reinforcement Learning.

 Photo by  David Wright  on  Unsplash

Photo by David Wright on Unsplash

1. Supervised learning

In “Intro into Machine Learning for Finance (Part 1)” we covered some high level theory on how a network is trained to improve the accuracy of its model, but we never discussed where the target values for comparison actually come from.

Classification Task

For a classification task, its easy to see. You wish for the model to be able to identify a falling wedge pattern, for example, so you feed it sets of inputs, each labeled as either being a falling wedge or not. If the model mislabels an input the error is back-propagated to change its prediction for the future.

Regression Task

Similarly, for a regression task, the label for each set of inputs will likely be the next value in the time series. For example, you might try to make a model which can learn to predict the price at the close of the next day based on a set of previous market movements.

Both of these cases are examples of “supervised learning” where the model is trained against an already labeled set of data and the error function is calculated as the difference between the predicted output and the supervised labels for the dataset.

This is generally a very simple and efficient process, as the network weightings are updated to minimise the error between the prediction and the target output for each batch of data-points. You already have the target function/ decision process to label the data, its just a case of fitting the model to try to emulate it.

In “Forecasting Market Movements Using Tensorflow — Intro into Machine Learning for Finance (Part 2)” we put supervised learning into practice with a simple neural network to make long/short calls.

2. Unsupervised learning

Meanwhile, unsupervised learning tasks revolve around the model learning complex relationships within data that you haven’t been able to determine yet. This can be through tasks such as clustering data-points, which help to give insight to the structure of the data.

The application to real life and, indeed, trading is often harder to see, which is exactly the reason it can’t be a “supervised” task — it is trying to find relationships in the data that we haven’t found yet.

One good use may be in the analysis of portfolios. By clustering equities and financial instruments you can get a unique view of the distribution of exposure and risk, and either hedge accordingly or look to maximise the efficiency of exposure to one area of the market.

An interesting paper on the creation of diverse portfolios via clustered stocks can be found here.

3. Reinforcement learning

Reinforcement learning is an interesting mix of both supervised and unsupervised learning. While it does require a specific target function to be trained towards, the error that trains the network is deferred from the actual decision making period. The network is instead trained against a “reward” and/or “punishment” function.

The model is attempting to learn a policy of actions to be able to maximise its reward function, such as learning the optimal time to hedge a portfolio in a turbulent market.

There is no immediate profit or loss at the point when the decision is made. Instead, its reward function will be based on how successful the hedging was in protecting portfolio value in coming time-steps. If it were to hedge too early, it could miss out on market upside. Where as, if it hedged too late then the portfolio will suffer greater drawdown.

Since RL is set up to generate actions for an environment, rather than to output a simple prediction, it requires a simulated training environment for the agent to react to and interact with. This can prove challenging both in terms of basic implementation and especially in optimizing to train in any reasonable time frame.

Again, further reading on reinforcement learning for portfolio hedging can be found here.

Comparison and Drawbacks

Dataset generation:

A supervised learning task, such as classification, requires a set of label data to train against. For a simple price predictor this will only require a small pre-processing script which sets the target value as the close price the next day. However, for more complex functions it will require a much more complex algorithm, or even manual pattern identification and labeling. And, due to the large data-sets needed for effective training (10s of thousands at minimum), this can be extremely time consuming, if not infeasible.

Unsupervised learning, on the other hand, has no such issue — as its trying to find relationships in unlabeled data. However, you still need to gather the dataset for use, verify the data is clean enough and interpret the results of the model relative to the data.

Reinforcement learning is similar to supervised, as you need a reward function for the model to train towards. However, instead of a labeled dataset, per se, the model is trained against a simulated environment. This can allow for a simpler function to identify the behavior to be rewarded, but brings added complexity to the setup of the data feeds and how they interact with the model. RL also has the added issue of data requirements — needing huge datasets to train effectively.

Training times:

While both supervised and unsupervised models can require significant time and resources to train adequately, they pale in comparison to reinforcement learning due to the nature of its deferred rewards when training. So, not only will you need a larger dataset to train against for RL, but you will need more powerful machines to run the training process in a comparable time.

Meanwhile, it’s hard to contrast the training process of supervised vs unsupervised methods, as both the time per training step and the number of training steps required will vary greatly depending on the size of the network used and the optimizer/ optimizer settings.

In all cases, it’s advisable to look at using newer optimizers, such as Adam Optimizer, as they can provide faster and less noisy training for the network, achieving a more accurate fit over fewer epochs.

Overfitting:

Overfitting is a serious concern for all training types and, once again, varies more by the quality of data and architecture than type of training method.

A rule of thumb to avoid overfitting: once you’ve found an architecture than can learn to accurately predict training data, if validation accuracies diverge during training then slowly reduce the size of the network until you find the smallest network that still trains to a good accuracy on the training dataset.

If the accuracy against the validation dataset still fails to converge then your issue likely lies with lack of causation between training features and outcome, rather than overfitting of the model.

Conclusion

Each training method is suited to a specific type of machine learning task and data, with supervised being the most likely candidate to create simple trading signals and unsupervised being used for analysing relationships in data to help refine strategies.

While reinforcement learning has great promise in certain tasks, it is unlikely to be particularly feasible vs other machine learning techniques due to huge data and computing requirements.

By Matthew Tweed

/

Forecasting Market Movements Using Tensorflow - Intro into Machine Learning for Finance (Part 2)

Multi-Layer Perceptron for Classification

Is it possible to create a neural network for predicting daily market movements from a set of standard trading indicators?

In this post we’ll be looking at a simple model using Tensorflow to create a framework for testing and development, along with some preliminary results and suggested improvements.

 Photo by  jesse orrico  on  Unsplash

Photo by jesse orrico on Unsplash

The ML Task and Input Features

To keep the basic design simple, it’s setup for a binary classification task, predicting whether the next day’s close is going to be higher or lower than the current, corresponding to a prediction to either go long or short for the next time period. In reality, this could be applied to a bot which calculates and executes a set of positions at the start of a trading day to capture the day’s movement.

The model is currently using 4 input features (again, for simplicity): 15 + 50 day RSI and 14 day Stochastic K and D.

These were chosen due to the indicators being normalized between 0 and 100, meaning that the underlying price of the asset is of no concern to the model, allowing for greater generalization.

While it would be possible to train the model against any number of other trading indicators or otherwise, I’d recommend sticking to those that are either normalized by design or could be modified to be price or volatility normalized. Otherwise a single model is unlikely to work on a range of stocks.

Dataset Generation

 (Code Snippet of a dataset generation example — full script at end of this post)

(Code Snippet of a dataset generation example — full script at end of this post)

The dataset generation and neural network scripts have been split into two distinct modules to allow for both easier modification, and the ability to re-generate the full datasets only when necessary — as it takes a long time.

Currently the generator script is setup with a list of S&P 500 stocks to download daily candles since 2015 and process them into the required trading indicators, which will be used as the input features of the model.

Everything is then split into a set of training data (Jan 2015 — June 2017) and evaluation data (June 2017 — June 2018) and written as CSVs to “train” and “eval” folders in the directory that the script was run.

These files can then be read on demand by the ML script to train and evaluate the model without the need to re-download and process any more data.

Model Training

 (Code Snippet of model training — full script at end of this post)

(Code Snippet of model training — full script at end of this post)

At start-up, the script reads all the CSV files in the “train” and “eval” folders into arrays of data for use throughout the training process. With such a small dataset, the RAM requirements will be low enough not to warrant extra complexity. But, for a significantly larger dataset, this would have to be updated to only read a sample of the full data at a time, rotating the data held in memory every few thousand training steps. This would, however, come at the cost of greater disk IO, slowing down training.

The neural network itself is also extremely small, as testing showed that with larger networks, evaluation accuracies tended to diverge quickly.

1_f4hCWcbMlbEdqsVQefZCiA.png

The network “long Output” and “short Output” are used as a binary predictor, with the highest confidence value being used as the model prediction for the coming day.

The “dense” layers within the architecture mean that each neuron is connected to the outputs of all the neurons in the layer below. These neurons are the same as described in “Intro into Machine Learning for Finance (Part 1)”, and use tanh as the activation function, which is a common choice for a small neural network.

Some types of data and networks can work better with different activation functions, such RELU or ELU for deeper networks. RELU (Rectifier Linear Unit) attempts to solve the vanishing gradient problem in deeper architectures, and the ELU is a variation on this to make training yet more efficient.

TensorBoard

As well as displaying prediction accuracy stats in the terminal every 1000 training steps, the ML script is also setup to record summaries for use with TensorBoard — making graphing of the training process much easier.

While I haven’t included anything other than scalar summaries, it’s possible to record everything from histograms of the node weightings to sample images or audio from the training data.

To use TensorBoard with the saved summaries, simply set the — logdir flag to directory you’re running the ML script in. You then open the browser of your choice and enter “localhost:6006” into the search bar. All being well, you now have a set of auto-updating charts.

Training results

 Node layouts: Model 1 (40,30,20,10), Model 2 (80,60,40,20), Model 3 (160,120,80,40)

Node layouts: Model 1 (40,30,20,10), Model 2 (80,60,40,20), Model 3 (160,120,80,40)

The results were, as expected, less than spectacular due to the simplicity of the example design and its input features.

We can see clear overfitting, as the loss/ error increases against the evaluation dataset for all tests, especially so on the larger networks. This means that the network is only learning the pattern of the specific training samples, rather than an a more generalized model. On top of this, the training accuracies aren’t amazingly high — only achieving a few percent above completely random guesses.

Suggestions for Modification and Improvement

The example code provides a nice model that can be played around with to help understand how everything works — but it serves more as a starting framework than a working model for prediction. As such, a few suggestions for improvements that you might want to make and ideas you could test

Input features

In its current state, the dataset is generated with only 4 input features and the model only looks at one point in time. This severely limits what you can expect it to be able to learn — would you be able to trade only looking at a few indicator values for one day in isolation?

First, modifying the dataset generation script to calculate more trading indicators and save them to the CSV. TA-lib has a wide range of functions which can be found here.

I recommend sticking to normalized indicators, similar to Stoch and RSI, as this takes the relative price of the asset out of the equation, so that the model can be generalized across a range of stocks rather than needing a different model for each.

Next, you could modify the ML script to read the last 10 data periods as the input at each time step, rather than just the one. This allows it to start learning more complex convergence and divergence patterns in the oscillators over time.

Network Architecture

As mentioned earlier, the network is tiny due to the lack of data and feature complexity of the example task. This will have to be altered to accommodate the extra data being fed by the added indicators.

The easiest way to do this would be to change the node layout variable to add extra layers or greater numbers of neurons per layer. You may also wish to experiment with different types of layer other than fully connected. Convolutional layers are often used for pattern recognition tasks with images, so could be interesting to test out on financial chart data.

Dataset labels

The dataset is labeled at “long” if price difference is >=0, otherwise “short”. However, you may wish to change the threshold to be equal to the median price change over the length of the data, to give a more balanced set of training data.

You may even wish to add a third category of “neutral” for days where the price stays within a limited range.

On top of this, the script also has the ability to vary the look ahead period for the increase or decrease in price. So it could be tested with a longer term prediction.

Conclusion

With the implementation of the suggested improvements, it is certainly possible to improve on the model to the point where it could be used as a complimentary trading indicator to a standard rule based strategy.

However, expectations should be tempered when it comes to such a simple architecture and training task. Machine learning can really set itself apart with a more refined network structure and prediction task.

As such, in the next article we’ll be looking at Supervised, Unsupervised and Reinforcement Learning, and how they can be used to create time series predictor and to analyze relationships in data to help refine strategies.

Full Script

By Matthew Tweed

/

Easily Build a Stock Trading Bot Using Broker API

Visual Strategy Development

Visual strategy creation is an important part of quick and efficient development, as it allows you to easily debug and adjust ideas by looking at how signals develop and change with shifts in the market.

I find Python to be a good language for this type of data-science, as the syntax is easy to understand and there are a wide range of tools and libraries to help you in your development. On top of this, the Alpaca Python API gives us an easy way to integrate market data without having to implement a new API wrapper.

*Disclaimer: As of today (July 27th 2018), Alpaca Trading API can be used only by invited beta users who opened accounts with Alpaca Securities.

For data processing and plotting, I recommend using TA-Lib and Matplotlib. Ta-Lib provides a nice library to calculate common market indicators, so that you don’t have to reimplement them yourself; while matplotlib is a simple yet powerful plotting tool which will serve you well for all types of data visualization.

Here’s a code snippet of an example framework script I put together (full scripts at the end of this section).

 (Code Snippet of an example trade visualizer script I put together— full script at end of this section)

(Code Snippet of an example trade visualizer script I put together— full script at end of this section)

The script adds a simple moving average cross strategy against a few different trading symbols to give a small sample of the how it might fair in live trading. This allows for a first sanity check for a new strategy’s signals. Once a strategy has passed visual inspection you can run it through a backtesting tool, such as the one discussed in the “Algo Trading for Dummies” series.

You may even wish to add visual markers to each simulated trade and, for a move advanced strategy, the indicators the signal was derived from. This can make it even easier to analyze the weaknesses of a signal set so that you can adjust its parameters.

Simple Trading Bot

Once you’ve moved past the backtesting stage, you’ll need a simple trading framework to integrate your strategies for live testing. This can then be run on a paper trading account to test the signals against a live data feed.

This is an important step in development, as it tests whether the strategy has been over-fit to its dataset. For example, a strategy could easily be tuned to perfectly trade a specific symbol over a backtesting period. However, this is unlikely to generalize well to other markets or different time periods — leading to ineffective signals and losses.

As such, you’ll want to a simple way to test your strategies in a staging environment, before committing any money to them with a real trading account. This is both for testing the strategy and the implementation, as a small bug in your code could be enough to wipe out an account, if left unchecked.

Here’s another example snippet of a trading bot which implements the moving average cross strategy (full script at end of this section).

 (Code Snippet of a trading bot which implements the moving average cross strategy — full script at end of this section)

(Code Snippet of a trading bot which implements the moving average cross strategy — full script at end of this section)

To make this into a full trading bot you could choose to either add a timed loop to the code itself or have the whole script run on a periodic schedule. The latter is often a better choice, as an exception causing an unexpected crash would completely stop the trading bot if it were a self contained loop. Where as, a scheduled task would have no such issue, as each polling step is a separate instance of the script.

On top of this, you’ll probably want to implement a logging system, so that you can easily monitor the bot and identify any bugs as it runs. This could be achieved by adding a function to write a text file with any relevant information at the end of each process.

Once you have a working strategy, the Alpaca API should make it easy to expand your trading bot into a full production system, allowing you to start trading quickly.

By Matthew Tweed

/

Data, Data, Data! 11 Great Financial Data Vendors

Hey everyone, Intern Rao here. A few weeks ago, I worked with Hitoshi and Yoshi to put together “9 Great Tools for Algo Trading”. 

Looking at the response to that article, we decided to write a follow up. Whether you’re a financial firm or an individual trader, financial data is key for putting together any good strategy. With so many vendors on the market today, many good options get lost in the noise. Here are 11 great financial data vendors. 

 By Chris Lee on Unsplash

By Chris Lee on Unsplash

If your service is featured and is being misrepresented in some form, please reach out to me ASAP (through my medium page, linked at the bottom), and an immediate correction will be made.

Vendors for the Individual Investor

The following data vendors are those that see their target audience as the individual investor. While they do have larger premium plans for high volume usage, the plan options and community are structured around the individual investor. 

(1) IEX

1_RHI1pnn1T3PSJVccb1PLAw.png

IEX, which stands for the Investor’s Exchange, was founded by four former employees of the Royal Bank of Canada. Upset by the unfair conditions of today’s stock market, they looked to create an exchange that leveled the playing field for long-term investors. Their story was chronicled in the 2014 Michael Lewis Book Flash Boys: A Wall Street Revolt.

41C3LHEK5TL.jpg

IEX has a significant amount of free data, as well as multiple APIs to access data. The catch is though that all pricing data is the price of that security on IEX. That said, their coverage of securities is second-to-none, as they have information on US equities, fixed income, and ETFs. Find out more on the IEX website here

(2) Quandl

1_kP9DMfLF-DBYjFhsOMM-nw.png

If you’ve used Zipline for backtesting at some point, you’ve probably used free data from Quandl. Based out of Toronto, Quandl first started as a search engine for financial data sets. As algo trading grew as a field, Quandl started repackaging their search engine of data sets as individual bundles. 

Quandl offers a bundle with US Equities for free, which is actually the default bundle you ingest on Quantopian’s Zipline backtesting engine. Be warned that this bundle doesn’t contain any fixed income or ETFs. All other bundles are priced for a one time fee, and Quandl even serves as a market-place for third parties to sell their bundles. If you’re newer to the field, Quandl’s not a bad place to start. 

(3) Alpha Vantage

1_sPDo6TuC3m4U4aRegDE1HA.png

AlphaVantage is a community-based data vendor. In the last few years, AlphaVantage have worked hard to foster a developer community, not unlike Quantopian. The community helped build their product, as well as discuss investment strategies and other apps built on AlphaVantage data. 

AlphaVantage is a generally free resource with APIs available in multiple languages. They provide both real-time data streams and historical market data for backtesting. The API doesn’t have a daily/weekly/monthly limit, but the free subscription does have limited latency. For the serious investor with larger API calls, a premium subscription is best. 

(4) Intrinio

1_uxLVVGQvRsSvrJ5RE0tGYg.png

Intrinio was founded in 2012, with the idea of making financial data easy to access. Intrinio actually gears their product towards developers, and is currently the data provider for the backtesting engine QuantRocket

They currently offer two forms of data — streams and packets. Data streams offer a free trial, and then are subscription based depending on the stream. Packets are bundles of historical data. These bundles can be bought for a one-time payment and are priced based on the width and depth of data coverage. They also offer applications like the Intrinio screener for Excel. 

(5) Zacks

1_ccRyhV11_-uBh39AKbk0gw.png

Zacks was founded in 1985 as a quantitative ranking of potential buys based on estimated earnings. That tool Zacks Ranks was the cornerstone of what Zacks would come to offer today. 

Zacks is more than just a data vendor. It offers both historical and real-time data, as well as a variety of third party applications to use that data. Zacks also features newsletters, advice from industry experts, and a community of developers. For traders who are looking to wet their feet, Zacks may be the place to go. 

(6) Polygon

1_8l14hAgNeXHuAJHCEoDTng.png

Polygon is far newer on the data vendor scene, having been founded in 2015. They have a variety of pricing plans based on data coverage as well as latency of connection. 

Polygon provides real-time quotes and data streams, and have official clients in 10 different languages including Python, C, and Go. In a trend keeping with other data vendors of late, Polygon has done their best to foster a community of developers, which has led to the creation of community clients including Perl, .NET, and Scala. 

Disclaimer: Polygon is the data vendor of choice for Alpaca

(7) EOD Historical Data

Quirkily named EOD Historical Data was founded in May of last Year, but has established itself with a combination of accurate data and affordable prices. The Lyon, France based company offers monthly rates that range from as low as $9.99 to a high of $39.99.

Business-Facing Vendors

The vendors listed in the previous section are geared towards the individual investor, they have premium subscriptions for businesses. These following vendors are specifically geared towards firms, and are out of the price range of the average individual investor. 

(8) Xignite

1_Vg_ukrEOUsClP5BAMJomHA.png

Xignite was initially founded as a wealth management platform. When the founder Stephane Dubois realized the challenge of obtaining accurate market data, he pivoted Xignite to be come a data vendor. 

Xignite offers real time data streams, daily quotes, and historical market data. For developers, they offer multiple APIs for historical data. Their products are expensive for the individual investor, and Xignite themselves identify as business facing. Some of their more famous clients include Robinhood, Wealthfront, and StockTwits.

(9) Thomson Reuters

1_XCLqOfm9b9YiHsYFWOSA1g.png

Thomson Reuters is the merger of mass multimedia Thomson Company and the information group Reuters. Founded in London, Reuters has been providing financial information since 1951. Like Bloomberg, Reuters transitioned into information providing in the late 20th century. 

Reuter’s provides pricing data for assets in over 200 exchanges for both equities and fixed income. Their software includes not just pricing data, but research and analytical tools, along with a dedicated news stream, and a mobile interface. The full Eikon software costs $22,000 a year, but individual investors can find a stripped down version with just pricing data for $3,600. 

(10) Bloomberg

1_IhoXsTBjPi1dD7xlIDmtnw.png

Bloomberg was founded in 1981 by former New York City mayor Michael Bloomberg. Bloomberg may not have Reuter’s pedigree, but it’s quickly established itself as industry standard.

The Bloomberg Terminal packs minute by minute data, trading tools and analytics for over 300 exchanges around the world. For algo traders, the terminal subscription even offers an API to access in multiple languages. But clocking in at $24,000 a year, it’s a hard sell for the individual investor. If you do have access, whether through your firm or institution, you can’t get better than Bloomberg. 

(11) YCharts

1_K2y6MIXeKzntNCMkvej0zw.png

YCharts was founded by Ara Anjargolian and Shawn Carpenter as an attempt to compete directly with the Bloomberg Terminal. Founded in 2009, the self proclaimed financial terminal of the web balances a load of data along with a variety of useful tools and integration.

They offer Excel integrations and data visualizations of niche metrics, which is in line with their target audience; hedge fund managers and sales representatives. Both groups are accustomed to in-depth research on behalf of their clients. 

YCharts Professional clocks in at $199/month which is expensive, but only about 10% of the cost of a Bloomberg Terminal. For those interested in a stripped down subscription with only access to the simplest data metrics, YCharts offers a membership at $49/month, which is affordable for most individual investors. 

Lastly...

I hope this data vendor list was useful. If you think there are services that I missed please let me know! I always appreciate any, and all feedback.

So, good luck trading everyone!

by Rao Vinnakota

 

/

Algo Trading for Dummies  -  Implementing an Actual Trading Strategy (Part 4)

Strategy Development and Implementation

While most strategies that are successful long term are based on a mix of technical and fundamental factors, the fundamental behaviors which are exploited are often very nuanced and vary hugely, so its hard to generalize for an article. As such, we’ll be focusing more on the tools and methods for making strategies based on technical analysis.

Image from iOS (8).png

Visual Strategy Creation and Refinement

There are many great financial charting tools available, with various different specialties, my personal favourite free option being tradingview.com.

1_Vy5CbHA0AdW7pwSmiwoZng (1).png

One of the most useful features for strategy creation is its simple scripting language to create both trading indicators and back-testable strategies. While the back-testing tool is rather limited in its functionality, it serves as a good first step sanity check.

Simple creation of trading indicators which are then overlaid directly onto the chart allows for rapid testing and debugging of ideas, as its much quicker to create a script and visually check it against the market than to fully implement and back test it.

This rapid development process is a good first step to making certain types of strategies, particularly for active trading strategies that act on single symbols at a time. However, it won’t do you any good for portfolio strategies or those which incorporate advanced hedging.

For that, you’ll want to create your own tools for visualising full back-tests with multiple trading pairs. This is where the logging features of your back-tester will come in. With the end results being plotted in your graphing tool of choice, such as matplotlib (for Python).

logo2.png

Full Back-tester Framework:

 (Simple example of a multi-symbol back-tester based on position handler from  previous article  — full script at end of this post)

(Simple example of a multi-symbol back-tester based on position handler from previous article — full script at end of this post)

Various plots, such as scatter graphs or hierarchical clustering, can be used to efficiently display and contrast different variations of the back-tested strategy and allow fine tuning of parameters.

Implementing and Back-testing

One of the easiest traps to fall into with the design of any predictive system is over-fitting to your data. It’s easy to see amazing results in back-tests if a strategy has been trained to completely fit the testing data. However, the strategy will almost certainly fall at the first hurdle when tested against anything out of sample, so is useless.

Meanwhile, at the other end of the spectrum, it is also possible to create a system which is overgeneralised. For example, a strategy which is supposed to actively trade the S&P 500 could easily turn a profit long term by always signaling long. But that completely defeats the purpose of trying to create the bot in the first place

The best practices for back-testing a system:

  1. Verify against out of sample data. If the strategy has been tuned against one set of data, it obviously going to perform well against it. All back-tests should be performed against a different set of data, whether that be a different symbol in the same asset class or the same symbol over a different time sample.
  2. Verify all strategies against some kind of benchmark. For a portfolio strategy you’d want to compare risk-adjusted returns metrics. For an active trading strategy you can look at risk:reward and win rate.
  3. Sanity check any strategies that pass the back-test. Where possible, look back over the specific set of steps it takes to make any trading signals. Do they make logical sense? If this isn’t possible (for example with Machine Learning), plot a set of its signals for out of sample data. Do they appear consistent and reasonable?
  4. If the strategy has gotten this far, run live tests. Many platforms offer paper-trading accounts for strategy testing. If not, you may be able to adapt your back-testing tool to accept live market data.

Once you finally have a fully tested and working strategy which you are happy with, you can run it with small amounts of capital on a testing account. While the strategy may be perfect, there is always the possibility of bugs in the trading bot itself.

Final Thoughts

Creating any effective trading strategy is hard, especially so when you also have to deal with defining it in objective terms that can be converted into code. It can be especially frustrating when nothing seems to produce reliable results. However, sticking to good practices when it comes to the data science of back-testing and refining a strategy will pay off vs learning those same lessons when a strategy under-performs with real money.

By Matthew Tweed

Full back-tester framework:

/

Intro into Machine Learning for Finance (Part 1)

There has been increasing talk in recent years about the application of machine learning for financial modeling and prediction. But is the hype justified? Is machine learning worth investing time and resources into mastering?

 Photo by Franck V. on Unsplash

Photo by Franck V. on Unsplash

This series will be covering some of the design decisions and challenges to creating and training neural networks for use in finance, from simple predictive models to the use of ML to create specialised trading indicators and statistics — with example code and models along the way.

If you are comfortable with machine learning in general, please feel free to skip and read from the 3rd section “Where can it be applied in finance?

What is Machine Learning?

In simply terms, machine learning is about creating software which can be “trained” to automatically adapt its predictive model without the need for hard-coded changes. There is often debate whether machine learning is considered a subset of Artificial Intelligence, or whether AI is a subset of ML, but they both work to the same broad goal of pattern recognition and analysis.

While different forms of machine learning and expert systems have been around for decades, only relatively recently have we seen large advances in their learning capabilities as both training methods and computer hardware has advanced.

With the creation of easy to use open-source libraries, it has now become easier than ever to create, train and deploy models without the need for specialist education.

Neural Networks

Artificial Neural networks are, again, a subset of the broad field of machine learning. They are among the most commonly used and are easier to understand conceptually.

A network is made up of layers of “neurons”, which each perform a very simple calculation based on their own trained. Individually, they provide very little in terms of processing. However, when combined into a layer, and layers stacked into a full network, the complexity that the model can learn widens and deepens.

 (Simple neural network structure)

(Simple neural network structure)

Each neuron has a weighting value associated with each input it receives. Its final output which is passes on is:

Sum of weightings * their respective input

This value is then put through an “activation function” (such as tanh or sigmoid).

The activation function can serve to normalize the output value of the neuron and add non-linearity, so that the network can learn functions more complex than simple linear relationships.

Once the input data has processed through the whole stack of neurons, you’ll be left with your simple prediction or statistics, such as a long/ short call for the next time period.

Training a network

Its all well and good looking at the flow of data for a model to make a prediction, but it wouldn’t be complete without a brief overview of how the network is actually trained to make these predictions.

During the training process, you run a set of data through the network to compare its predictions against the desired results for each data point. The difference between the output and the target value(s) is then used to update the weightings within the network through “back-propagation”.

Back-propagation starts at the output neurons, looking at the component values they received from the previous layer and the associated weightings. The weightings are given a small adjustment to bring the updated prediction closer inline with the desired output.

This error between the output and target is then fed to the next layer down were the same updating process is repeated until all the network weightings have been marginally adjusted.

This is repeated multiple times over every data-point in the training dataset, giving the network weightings a small adjustment each time until predictions converge towards the target outputs.

In theory, this learned model will then be able to make accurate predictions on out of sample data. However, it is very easy to over-fit to the training data if the model is too large and simply learns the inputs rather than a generalized representation.

 (Example of over-fitting for simple classification — made with  tensorflow playground )

(Example of over-fitting for simple classification — made with tensorflow playground)

Where can it be applied in finance?

Since neural networks can be used to learn complex patterns in a dataset, they can be used to automate some of the processes of technical analysis commonly used by traders.

A moving average cross strategy can be coded with ease, needing only a few lines for a simple trading bot. However, more complex patterns such as indicator divergence, flags and wedges, and support/ resistance levels can be harder to identify with simple rules. And, indeed, forming a set of chart patterns into an objective trading strategy is often hard to achieve.

Machine learning can be applied in several different cases for this one scenario.

  1. Pattern recognition from candle data to identify levels of significance
  2. Creating specialised indicators to add to a simple rule based strategy
  3. A final processing and aggregation layer to make a prediction from your set of indicators.

Machine learning can also be applied in slightly more exotic ways to help refine further information:

  • Denoising and auto-encoding — used to remove some of the random noise of a price feed to help distill the underlying trend or specifics of the market sentiment.
  • Clustering — group together different equities and financial instruments to streamline the value of a portfolio. Or it could be used to evaluate and reduce the risk of a portfolio.
  • Regression — often used to try to predict the price at the next time step, however it can be applied to a range of abstracted indicators to help predict trading signals earlier.

Machine Learning vs Traditional Methods

In many of the cases above, it is perfectly possible (and often advisable) to stick to more traditional algorithms. A well made machine learning framework has the advantage when it comes it easy retraining, but at the cost of complexity, computational overhead and interpretability.

While there have been advances in the use of relevance heat-maps to help explain the source of a prediction, neural nets still mostly remain black boxes — ruling out certain use cases, such as for fund managers, where decision justification and accountability is of importance to clients.

We may well see attitudes change over time as ML assisted trading and investing becomes wider spread, but for now this remains a large obstacle to practical application in certain settings.

Furthermore, it is often a lot easier to make a simple rule based strategy over a full ML model and training structure. But when done right, machine learning can provide cutting edge accuracy to the adversarial world of financial trading.

Conclusion

Despite some of the added challenges and complexity brought by the addition of machine learning, it provides a new range of tools which can be applied to a range of problems in finance, allowing for greater automation and accuracy.

In the next post we’ll be looking deeper into some of the theory and decision making behind different training methods and tasks for a new model.

By Matthew Tweed

/

I Built a Go Plugin for Alpaca’s MarketStore as a College Intern

Hey all! I’m Ethan and recently started working for Alpaca as a Software Engineering Intern! For my first task, I created a Go plugin for Alpaca’s open source MarketStore server that fetches and writes Binance minute-level.

Image from iOS (2).jpg

You might be wondering — What is MarketStore? MarketStore is a database server written in Go that helps users handle large amounts of financial data. Inside of MarketStore, there are Go plugins that allow users to gather important financial and crypto data from third party sources.

For this blog post, I’ll be going over how I created the plugin from start to finish in three sections: Installing MarketStore, understanding MarketStore’s plugin structure, creating the Go plugin., and installing the Go plugin.

Experience Installing and Running MarketStore Locally

First, I set up MarketStore locally. I installed the latest version of Go and started going through the installation process outlined in MarketStore’s README. All the installation commands worked swimmingly, but when I tried to run marketstore using

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

I got this weird error:

/usr/local/go/src/fmt/print.go:597:CreateFile/go/src/github.com/alpacahq/marketstore/executor/wal.go:87open /project/data/mktsdb/WALFile.1529203211246361858.walfile: no such file or directory: Error Creating WAL File

I was super confused and couldn’t find any other examples of this error online. After checking and changing permissions in the directory, I realized my mkts.yml file configuration root_directory was incorrect. To resolve this, I changed mkts.yml from

root_directory: /project/data/mktsdb

To

root_directory: /home/ethanc/go/bin/src/github.com/alpacahq/marketstore/project/data/mktsdb

and reran

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

This time, everything worked fine and I got this output:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml
…
I0621 11:37:52.067803 27660 log.go:14] Launching heartbeat service…
I0621 11:37:52.067856 27660 log.go:14] Enabling Query Access…
I0621 11:37:52.067936 27660 log.go:14] Launching tcp listener for all services
…

To enable the gdaxfeeder plugin which grabs data from a specified cryptocurrency, I uncommented these lines in the mkts.yml file:

and reran

ethanc@ethanc-Inspiron-5559:~$ marketstore -config mkts.yml

which yielded:

…
I0621 11:44:27.248433 28089 log.go:14] Enabling Query Access…
I0621 11:44:27.248448 28089 log.go:14] Launching tcp listener for all services…
I0621 11:44:27.254118 28089 gdaxfeeder.go:123] lastTimestamp for BTC = 2017–09–01 04:59:00 +0000 UTC
I0621 11:44:27.254189 28089 gdaxfeeder.go:123] lastTimestamp for ETH = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254242 28089 gdaxfeeder.go:123] lastTimestamp for LTC = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254266 28089 gdaxfeeder.go:123] lastTimestamp for BCH = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254283 28089 gdaxfeeder.go:144] Requesting BTC 2017–09–01 04:59:00 +0000 UTC — 2017–09–01 09:59:00 +0000 UTC
…

Now that I got MarketStore running, I used Jupyter notebooks and tested out the commands listed in this Alpaca tutorial and got the same results. You can read more about how to run MarketStore in MarketStore’s README, Alpaca’s tutorial, and this thread.

Understanding how MarketStore Plugins work

After installing, I wanted to understand how their MarketStore repository works and how their current Go plugins work. Before working in Alpaca, I didn’t have any experience with the Go programming language. So, I completed the Go’s “A Tour of Go” tutorial to get a general feel of the language. Having some experience with C++ and Python, I saw a lot of similarities and found that it wasn’t as difficult as I thought it would be.

Creating a MarketStore Plugin

To get started, I read the MarketStore Plugin README. To summarize at a very high level, there are two critical Go features which power plugins: Triggers and BgWorkers. You use triggers when you want your plugin to respond when certain types data are written to your MarketStore’s database. You would use BgWorkers if you want your plugin to run in the background.

I only needed to use the BgWorker feature because my plugin’s goal is to collect data outlined by the user in the mkts.yml configuration file.

To get started, I read the code from the gdaxfeeder plugin which is quite similar to what I wanted to do except that I’m trying to get and write data from the Binance exchange instead of the GDAX exchange.

I noticed that the gdaxfeeder used a GDAX Go Wrapper, which got its historical price data public endpoint. Luckily, I found a Go Wrapper for Binance created by adshao that has the endpoints which retrieves the current supported symbols as well as retrieves Open, High, Low, Close, Volume data for any timespan, duration, or symbol(s) set as the parameters.

To get started, I first created a folder called binancefeeder then created a file called binancefeeder.go inside of that. I then first tested the Go Wrapper for Binanceto see how to create a client and talk to the Binance API’s Kline endpoint to get data:

I then ran this command in my root directory:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ go run binancefeeder.go

and received the following response with Binance data:

&{1529553060000 6769.28000000 6773.91000000 6769.17000000 6771.34000000 32.95342700 1529553119999 223100.99470354 68 20.58056800 139345.00899491}
&{1529553120000 6771.33000000 6774.00000000 6769.66000000 6774.00000000 36.43794400 1529553179999 246732.39415947 93 20.42194600 138288.41850603}
…

So, it turns out that the Go Wrapper worked!

Next, I started brainstorming how I wanted to configure the Binance Go plugin. I ultimately chose symbols, queryStart, queryEnd, and baseTimeframe as my parameters since I wanted the user to query any specific symbol(s), start time, end time, and timespan (ex: 1min). Then, right after my imports, I started creating the necessary configurations and structure for BinanceFetcher for a MarketStore plugin:

The FetcherConfig’s members are what types of settings the user can configure in their configuration file (ex: mkts.yml) to start the plugin. The BinanceFetcher’’s members are similar to the FetcherConfig with the addition of the config member. This will be used in the Run function later.

After creating those structures, I started to write the background worker function. To set it up, I created the necessary variables inside the backgroundworker function and copied the recast function from the gdaxfeeder. The recast function uses Go’s Marshal function to encode the config JSON data received, then sets a variable ret to an empty interface called FetcherConfig. Then it stores the parsed JSON config data in the ret variable and returns it:

Then inside the NewBgWorker function, I started to create a function to determine and return the correct time format as well as set up the symbols, end time, start time, and time duration. If there are no symbols set, by default, the background worker retrieves all the valid cryptocurrencies and sets the symbol member to all those currencies. It also checks the given times and duration and sets them to defaults if empty. At the end, it returns the pointer to BinanceFetcher as the bgworker.BgWorker:

Then, I started creating the Run function which is implemented by BgWorker (see bgworker.go for more details). To get a better sense of how to handle errors and write modular code in Go, I read the code for plugins gdaxfeeder and polygon plugins. The Run function receives the BinanceFetcher (which is dereferenced since bgworker.BgWorker was the pointer to BinanceFetcher). Our goal for the Run function is to call the Binance API’s endpoint with the given parameters for OHLCV and retrieve the data and writes it to your MarketStore’s database.

I first created a new Binance client with no API key or secret since I’m using their API’s public endpoints.

Then, to make sure that the BinanceFetcher doesn’t make any incorrectly formatted API calls, I created a function to check the timestamp format using regex and change it to the correct one. I had to convert the user’s given timestamp to maintain consistency in the Alpaca’s utils.Timeframe which has a lot of helpful functions but has different structure members than the one’s Binance uses (ex: “1min” vs. “1m”). If the user uses an unrecognizable timestamp format, it sets the baseTimeframe value to 1 minute:

The start and end time objects are already checked in the NewBgWorker function and returns a null time.Time object if invalid. So, I only have to check if the start time is empty and set it to the default string of the current Time. The end time isn’t checked since it will be ignored if incorrect which will be explained in the later section:

Now that the BinanceFetcher checks for the validity of its parameters and sets it to defaults if its not valid, I moved onto programming a way to call the Binance API. 

To make sure we don’t overcall the Binance API and get IP banned, I used a for loop to get the data in intervals. I created a timeStart variable which is first set to the given time start and then created a timeEnd variable which is 300 times the duration plus the timeStart's time. At the beginning of each loop after the first one, the timeStart variable is set to timeEnd and the timeEnd variable is set to 300 times the duration plus the timeStart’s time:

When it reaches the end time given by the user, it simply alerts the user through glog and continues onward. Since this is a background worker, it needs to continue to work in the background. Then it writes the data retrieved to the MarketStore database. If invalid, the plugin will stop because I don’t want to write garbage values to the database:

Installing Go Plugin

To install, I simply changed back to the root directory and ran:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ make plugins

Then, to configure MarketStore to use my file, I changed my config file, mkts.yml, to the following:

Then, I ran MarketStore:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

And got the following:

…
I0621 14:48:46.944709 6391 plugins.go:42] InitializeBgWorkers
I0621 14:48:46.944801 6391 plugins.go:45] bgWorkerSetting = &{binancefeeder.so BinanceFetcher map[base_timeframe:1Min query_start:2018–01–01 00:00 query_end:2018–01–02 00:00 symbols:[ETH]]}
I0621 14:48:46.952424 6391 log.go:14] Trying to load module from path: /home/ethanc/go/bin/bin/binancefeeder.so…
I0621 14:48:47.650619 6391 log.go:14] Success loading module /home/ethanc/go/bin/bin/binancefeeder.so.
I0621 14:48:47.651571 6391 plugins.go:51] Start running BgWorker BinanceFetcher…
I0621 14:48:47.651633 6391 log.go:14] Launching heartbeat service…
I0621 14:48:47.651679 6391 log.go:14] Enabling Query Access…
I0621 14:48:47.651749 6391 log.go:14] Launching tcp listener for all services…
I0621 14:48:47.654961 6391 binancefeeder.go:198] Requesting ETH 2018–01–01 00:00:00 +0000 UTC — 2018–01–01 05:00:00 +0000 UTC
…

Testing:

When I was editing my plugin and debugging, I often ran the binancefeeder.go file:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ go run binancefeeder.go

If I ran into an issue I couldn’t resolve, I used the equivalent print function for Go (fmt). If there is an issue while running the plugin as part of MarketStore via the marketstore -config mkts.yml command, I used the glog.Infof() or glog.Errorf() function to output the corresponding error or incorrect data value.

Lastly, I copied the gdaxfeeder test go program and simply modified it for my binancefeeder test go program.

You’ve made it to the end of the blog post! Here’s the link to the Binance plugin if you want to see the complete code. If you want to see all of MarketStore’s plugins, check out this folder.

To summarize, if you want to create a Go extension for any open source repository, I would first read the existing documentation whether it is a README.md or a dedicated documentation website. Then, I would experiment around the repositories code by changing certain parts of the code and see which functions correspond with what action. Lastly, I would look over previous extensions and refactor an existing one that seems close to your plugin idea.

Thanks for reading! I hope you take a look at the MarketStore repository and test it out. If you have any questions, few free to comment below and I’ll try to answer!

Special thanks to Hitoshi, Sho, Chris, and the rest of the Alpaca’s Engineering team for their code reviews and help as well as Yoshi and Rao for providing feedback for this post.

By: Ethan Chiu

/

Algo Trading for Dummies  - 3 Useful Tips When Storing Trade Signals (Part 2)

Handling & Storing Trading Signals Are Hard

The calculation of simple trading indicators is made easy with the use of any one of the Technical Analysis libraries available, however, the efficient handling and storage of trading signals can be one of the most complex aspects of a live trading system.

 Photo by  Jeremy Thomas  on  Unsplash

Calculating Basic Indicators? No Problem

While it’s often necessary to create custom indicators and trading signals, there is still significant benefit to using a standard library such as Ta-Lib for the basics. This saves a lot of time rather than having to reimplement a set of common indicators in your language of choice. It also has the added bonus of increased processing speed as opposed to calculation done in native Python, for example.

When it comes to moving averages and other simple time-series indicators, the process is fairly self explanatory — at every time step you calculate the next numerical value which is then used as the most up-to-date signal to trade against.

(Code Snippet to read data CSV files and process into trading indicators) https://gist.github.com/yoshyoshi/73f130026c25a7dcdb9d6909b1990277

The signals themselves will be stateless in that respect — you aren’t concerned with previous signals that have been made, only the combination of indicators present at that moment. However, you may still wish to store some of the information from the indicators, if only for external analysis at a later point.

Different Story For Advanced Pattern Recognition

Meanwhile, more advanced pattern recognition cannot be handled in such a simple manner. If, for example, your strategy relies on finding divergence between indicators, its possible to get a significant performance boost by storing some past data-points from which to construct the signal at each new step, rather than having to reprocess the full set of data in the look-back period every time.

This is the trade-off between storage/ RAM efficiency and processing efficiency, with the latter also requiring greater software complexity to achieve.

How You Should Store Signals Depends On How Fast You Need It To Be

For optimal processing efficiency, you would not only store all the previously calculated signals from past time-stamps, but also the relevant information to calculate the next step in as fewer steps as possible.

While this would be completely unnecessary for any system with a polling rate above a seconds, it is exactly the kind of consideration you would have for a higher frequency strategy.

Meanwhile, a portfolio re-balancing system, or even most day-trading strategies, have all the time in the world (relatively). You could easily recalculate all the relevant signals at each time-step, which would cut down on the need for the handling of historical indicator sets.

Depending on the trading period of the system, it may also be worth using a hybrid approach to indicator and signal storage. Rather than permanently saving the data, you could calculate the full set of indicators at start-up and periodically dump and refresh the data to keep only whats going to be used in RAM.

The precise design trade-offs should considered on an individual basis, as holding more data in RAM may not be an option when running the software from lower power cloud computing instances nor, at the other end of the spectrum, would you be able to spare the seconds to recalculate everything for a market making bot.

3 Useful Tips When Storing Trade Signals

As mentioned in the part 1 of this series, there are range of different storage solutions that can be used for trading data. However, there are several best practices which apply across all:

  1. Keep indicators in a numeric or boolean format where possible for storage. For example, splitting a more complex signal set into boolean components. This particular problem caused me several issues in projects I’ve had to work on in the past.
  2. Only store what is complex or time-consuming to recalculate. If a set of signals can be calculated in time in a stateless manner, its probably easier to do so than add the design complexity of storing extra information.
  3. Plan out the flow of data through your system before you start programming anything. What market data is going to be pulled for each time-step? What will then be calculated from this and what is necessary to store? A well thought-out design will reduce complexity and hassle down the line.

Past this, common sense applies. Its probably best to store the indicators and signals in the same time-series format as, and along side, the underlying symbols they’re derived from. More complex signals, or indicators derived from multiple symbols, may even warrant their own calculation and storage process.

You could even go as far as to create a separate indicator feed script which calculates and stores everything separately from the trading bot software itself. The database could then be read by each bot as just another data feed. This not only has the benefit of keeping the system more modular, but also allowing you to create a highly optimized calculation function without the complexity of direct integration into a live system.

Whatever flavour of system you end up using, make sure to plan out the data storage and access first and foremost, before starting the rest of the design and implementation process.

By Matthew Tweed

/