Algo Trading for Dummies  -  Implementing an Actual Trading Strategy (Part 4)

Strategy Development and Implementation

While most strategies that are successful long term are based on a mix of technical and fundamental factors, the fundamental behaviors which are exploited are often very nuanced and vary hugely, so its hard to generalize for an article. As such, we’ll be focusing more on the tools and methods for making strategies based on technical analysis.

Image from iOS (8).png

Visual Strategy Creation and Refinement

There are many great financial charting tools available, with various different specialties, my personal favourite free option being tradingview.com.

1_Vy5CbHA0AdW7pwSmiwoZng (1).png

One of the most useful features for strategy creation is its simple scripting language to create both trading indicators and back-testable strategies. While the back-testing tool is rather limited in its functionality, it serves as a good first step sanity check.

Simple creation of trading indicators which are then overlaid directly onto the chart allows for rapid testing and debugging of ideas, as its much quicker to create a script and visually check it against the market than to fully implement and back test it.

This rapid development process is a good first step to making certain types of strategies, particularly for active trading strategies that act on single symbols at a time. However, it won’t do you any good for portfolio strategies or those which incorporate advanced hedging.

For that, you’ll want to create your own tools for visualising full back-tests with multiple trading pairs. This is where the logging features of your back-tester will come in. With the end results being plotted in your graphing tool of choice, such as matplotlib (for Python).

logo2.png

Full Back-tester Framework:

(Simple example of a multi-symbol back-tester based on position handler from  previous article  — full script at end of this post)

(Simple example of a multi-symbol back-tester based on position handler from previous article — full script at end of this post)

Various plots, such as scatter graphs or hierarchical clustering, can be used to efficiently display and contrast different variations of the back-tested strategy and allow fine tuning of parameters.

Implementing and Back-testing

One of the easiest traps to fall into with the design of any predictive system is over-fitting to your data. It’s easy to see amazing results in back-tests if a strategy has been trained to completely fit the testing data. However, the strategy will almost certainly fall at the first hurdle when tested against anything out of sample, so is useless.

Meanwhile, at the other end of the spectrum, it is also possible to create a system which is overgeneralised. For example, a strategy which is supposed to actively trade the S&P 500 could easily turn a profit long term by always signaling long. But that completely defeats the purpose of trying to create the bot in the first place

The best practices for back-testing a system:

  1. Verify against out of sample data. If the strategy has been tuned against one set of data, it obviously going to perform well against it. All back-tests should be performed against a different set of data, whether that be a different symbol in the same asset class or the same symbol over a different time sample.
  2. Verify all strategies against some kind of benchmark. For a portfolio strategy you’d want to compare risk-adjusted returns metrics. For an active trading strategy you can look at risk:reward and win rate.
  3. Sanity check any strategies that pass the back-test. Where possible, look back over the specific set of steps it takes to make any trading signals. Do they make logical sense? If this isn’t possible (for example with Machine Learning), plot a set of its signals for out of sample data. Do they appear consistent and reasonable?
  4. If the strategy has gotten this far, run live tests. Many platforms offer paper-trading accounts for strategy testing. If not, you may be able to adapt your back-testing tool to accept live market data.

Once you finally have a fully tested and working strategy which you are happy with, you can run it with small amounts of capital on a testing account. While the strategy may be perfect, there is always the possibility of bugs in the trading bot itself.

Final Thoughts

Creating any effective trading strategy is hard, especially so when you also have to deal with defining it in objective terms that can be converted into code. It can be especially frustrating when nothing seems to produce reliable results. However, sticking to good practices when it comes to the data science of back-testing and refining a strategy will pay off vs learning those same lessons when a strategy under-performs with real money.

By Matthew Tweed

Full back-tester framework:

/

Algo Trading for Dummies -  Building a Custom Back-tester (Part 3)

While there are many simple backtesting libraries available, they can be quite complex to use effectively — requiring a lot of extra processing of data sets. It is sometimes worth coding a custom back-tester to suit your needs.

Image from iOS (4).png

Building a back-tester is a fantastic conceptual exercise. Not only does this give you a deeper insight into orders and their interaction with the market, but it can also provide the framework for the order handling module of your trading bot.

Order Handling

One of the key pieces to an active trading strategy is the handling of more advanced orders types, such as trailing stops, automatically hedged positions or conditional orders.

For this you’ll want a separate module to manage the order logic before submitting to an exchange. You may even need a dedicated thread to actively manage orders once submitted, in case the platform itself doesn’t offer the necessary types.

Its best for the module to keep an internal representation of each position and its associated orders, which is then verified and amended as the orders are filled. This means you can run calculations against your positions without the need to constantly be querying the broker. It also allows you to easily convert the code for use in your back-tester, by simply altering the order fill checks to reference the historical data at each time step.

(Code Snippet of an order handling function as part of a position handler — full script at end of article)

(Code Snippet of an order handling function as part of a position handler — full script at end of article)

It may also be worth implementing order aggregation and splitting algorithms. For example, you may want a function to split a larger limit order across multiple price levels to hedge your bets on the optimal fill. Or, indeed, you might need a system to net together the orders of multiple simultaneous strategies.

Assumptions and Issues of Back-testing

Unless you’re using tick data and bid/ask snapshots to back-test against, there will always be a level of uncertainty in a simulated trade as to whether it would fill fully, at what price, and at what time. The period of each data point can also cause issues if its above the desired polling rate of the trading bot.

These uncertainties are lessened as the average holding period for each trade increased vs the resolution of your data, but is never fully eliminated. It is advised to always assume the worst case scenario in your simulation, as its better for a strategy to be over prepared than under.

(Back-testing order processing logic implemented into position handler — full script at end of article)

(Back-testing order processing logic implemented into position handler — full script at end of article)

For example, if a stop-loss order would have been triggered during the span of a bar, then you’d want to add some slippage to its trigger price and/or use the bar’s closing price. In reality, your are unlikely to get filled so unfavorably, but it’s impossible to tell without higher granularity data.

On top of this, it is impossible to simulate the effect of your order on the market movement itself. While this would be unlikely to have a noticeable effect on most strategies, if you’re using extremely short holding times on each trade or larger amounts of capital, it could certainly be a factor.

Designing an Efficient Back-tester

When calculating the next time step for an indicator, unless you’ve stored all relevant variables you will be recalculating a lot of information from the look-back period. This is unavoidable in a live system and, indeed, less of an issue, as you won’t be able to process data faster than it arrives. But you really don’t want to wait around longer than you have to for a simulation to complete.

The easiest and most efficient workaround is to calculate the full set of indicators over the whole dataset at start-up. These can then be indexed against their respective symbols and time stamps and saved for later. Even better, you could run a batch of back-tests in the same session without needing to recalculate the basic indicators between runs.

At each time you will then simply query the set of indexed indicators, construct the trading signals and push the orders to the order handling module, where the simulated positions are calculated along with their profit/ loss. You’ll also want to store the position and order fill information, either as a subscript to the back-tester or integrated directly into the position handling module.

Even Improving Your Back-tester

Back-testing is only as useful as the insight its statistics provide. Common review metrics include win/loss ratio, average profit/loss, average trade time, etc. However you may want to generate more insightful reports, such as position risk:reward ratios or an aggregate of price movement before and after each traded signal, which allows you to fine tune the algorithm.

Once the full framework has been designed, implemented and debugged should you start looking for ways to speed up and upgrade the inner loop of the back-tester (the order handling module). It is a lot easier to take a working program and make it faster than it is to take an overly optimized program and make it work.

By Matthew Tweed

Full position handling class framework:

/

Algo Trading for Dummies  - 3 Useful Tips When Storing Trade Signals (Part 2)

Handling & Storing Trading Signals Are Hard

The calculation of simple trading indicators is made easy with the use of any one of the Technical Analysis libraries available, however, the efficient handling and storage of trading signals can be one of the most complex aspects of a live trading system.

Photo by  Jeremy Thomas  on  Unsplash

Calculating Basic Indicators? No Problem

While it’s often necessary to create custom indicators and trading signals, there is still significant benefit to using a standard library such as Ta-Lib for the basics. This saves a lot of time rather than having to reimplement a set of common indicators in your language of choice. It also has the added bonus of increased processing speed as opposed to calculation done in native Python, for example.

When it comes to moving averages and other simple time-series indicators, the process is fairly self explanatory — at every time step you calculate the next numerical value which is then used as the most up-to-date signal to trade against.

(Code Snippet to read data CSV files and process into trading indicators) https://gist.github.com/yoshyoshi/73f130026c25a7dcdb9d6909b1990277

The signals themselves will be stateless in that respect — you aren’t concerned with previous signals that have been made, only the combination of indicators present at that moment. However, you may still wish to store some of the information from the indicators, if only for external analysis at a later point.

Different Story For Advanced Pattern Recognition

Meanwhile, more advanced pattern recognition cannot be handled in such a simple manner. If, for example, your strategy relies on finding divergence between indicators, its possible to get a significant performance boost by storing some past data-points from which to construct the signal at each new step, rather than having to reprocess the full set of data in the look-back period every time.

This is the trade-off between storage/ RAM efficiency and processing efficiency, with the latter also requiring greater software complexity to achieve.

How You Should Store Signals Depends On How Fast You Need It To Be

For optimal processing efficiency, you would not only store all the previously calculated signals from past time-stamps, but also the relevant information to calculate the next step in as fewer steps as possible.

While this would be completely unnecessary for any system with a polling rate above a seconds, it is exactly the kind of consideration you would have for a higher frequency strategy.

Meanwhile, a portfolio re-balancing system, or even most day-trading strategies, have all the time in the world (relatively). You could easily recalculate all the relevant signals at each time-step, which would cut down on the need for the handling of historical indicator sets.

Depending on the trading period of the system, it may also be worth using a hybrid approach to indicator and signal storage. Rather than permanently saving the data, you could calculate the full set of indicators at start-up and periodically dump and refresh the data to keep only whats going to be used in RAM.

The precise design trade-offs should considered on an individual basis, as holding more data in RAM may not be an option when running the software from lower power cloud computing instances nor, at the other end of the spectrum, would you be able to spare the seconds to recalculate everything for a market making bot.

3 Useful Tips When Storing Trade Signals

As mentioned in the part 1 of this series, there are range of different storage solutions that can be used for trading data. However, there are several best practices which apply across all:

  1. Keep indicators in a numeric or boolean format where possible for storage. For example, splitting a more complex signal set into boolean components. This particular problem caused me several issues in projects I’ve had to work on in the past.
  2. Only store what is complex or time-consuming to recalculate. If a set of signals can be calculated in time in a stateless manner, its probably easier to do so than add the design complexity of storing extra information.
  3. Plan out the flow of data through your system before you start programming anything. What market data is going to be pulled for each time-step? What will then be calculated from this and what is necessary to store? A well thought-out design will reduce complexity and hassle down the line.

Past this, common sense applies. Its probably best to store the indicators and signals in the same time-series format as, and along side, the underlying symbols they’re derived from. More complex signals, or indicators derived from multiple symbols, may even warrant their own calculation and storage process.

You could even go as far as to create a separate indicator feed script which calculates and stores everything separately from the trading bot software itself. The database could then be read by each bot as just another data feed. This not only has the benefit of keeping the system more modular, but also allowing you to create a highly optimized calculation function without the complexity of direct integration into a live system.

Whatever flavour of system you end up using, make sure to plan out the data storage and access first and foremost, before starting the rest of the design and implementation process.

By Matthew Tweed

/

Algo Trading for Dummies  -  Collecting & Storing The Market Data (Part 1)

The lifeblood of any algorithmic trading system is, of course, its data — so that’s what we’ll cover in the first two posts of the mini-series.

Photo by  Farzad Nazifi  on  Unsplash

Always Always Collect Any Live Data

For the retail trader, most platforms and brokers are broadly the same, you’ll be provided with a simple wrapper for a relatively simple REST or Websocket API. It’s usually worth modifying the provided wrapper to suit your purposes, and potentially create your own custom wrapper — however, that can be done later once you have a better understanding of the structure and requirements of your trading system.

Depending on the nature of the trading strategy, there are various types of data you may need to access and work with — OHLCV data (candlesticks), bid/ asks, and fundamental or exotic data. OHLCV is usually the easiest to get historical data for, which will be important later for back-testing of strategies. While there are some sources for tick data and historic bid/ask or orderbook snapshots, they generally come at high costs.

With this last point in mind, it’s always good to collect any live data which will be difficult or expensive to access at a later date. This can be done by setting up simple polling scripts to periodically pull and save any data that might be relevant for back-testing in the future, such as bid/ask spread. This data can provide helpful insight into the market structure, which you wouldn’t be able to track otherwise.

Alpaca Python Wrapper Lets You Start Off Quickly

The Alpaca Python Wrapper provides a simple API wrapper to begin working with to create the initial proof of concept scripts. It serves well for both downloading bulk historical data and pulling live data for quick calculations, so will need little modification to get going.

It’s also to be noted that the Alpaca Wrapper returns market data in the form of pandas Dataframes, which has slightly different syntax compared to a standard Python array or dictionary — although this is covered thoroughly in the documentation so shouldn’t be an issue.

Keeping A Local Cache Of Data

While data may be relatively quick and easy to access on the fly, via the market API, for live trading, even small delays become a serious slow down when running batches of backtesting across large time periods or multiple trading symbols. As such, it’s best to keep a local cache of data to work with. This also allows you to create consistent data samples to design and verify your algorithms against.

There are many different storage solutions available, and in most cases it will come down to what you’re most familiar with. But, we’ll explore some of the options anyway.

No Traditional RDB For Financial Data Please

Financial data is time-series, meaning that each attribute is indexed by its associated time-stamp. Depending on the volume of data-points, traditional relational databases can quickly become impractical, as in many cases it is best to treat each data column as a list rather than the database as a collection of separate records.

On top of this, a database manager can add a lot of unnecessary overhead and complexity for a simple project that will have limited scaling requirements. Sure, if you’re planning to make a backend data storage solution which will be constantly queried by dozens of trading bots for large sets of data, you’ll probably want a full specialised time-series database.

However, in most cases you’ll be able to get away with simply storing the data in CSV files — at least initially.

Cutting Down Dev Time By Using CSVs

(Code Snippet to download and store OHLCV data into a CSV) https://gist.github.com/yoshyoshi/5a35a23ac263747eabc70906fd037ff3

The use of CSVs, or another simple format, significantly cuts down on usage of a key resource — development time. Unless you know that you will absolutely need a higher speed storage solution in the future, it’s better to keep the project as simple as possible. You’re unlikely be using enough data to make local storage speed much of an issue.

Even an SQL database can easily handle the storage and querying of hundreds of thousands of lines of data. To put that in perspective, 500k lines is equivalent to the 1 minute bars for a symbol between June 2013 and June 2018 (depending on trading hours). A well optimized system which only pulls and processes the necessary data will have no problem in overheads, meaning that any storage solution should be fine. Whether than be an SQL database, NoSQL or a collection of CSV files in a folder.

Additionally, it isn’t infeasible to store the full working dataset in RAM while in use. The 500k lines of OHLCV data used just over 700MB of RAM when serialized into lists (Tested in Python with data from the Alpaca client mentioned earlier).

When it comes to the building blocks of a piece of software, its best to keep everything as simple and efficient as possible, while keeping the components suitably modular so they may be adjusted in future if the design specification of the project changes.

By Matthew Tweed

/