Best Starting Kits for Algo Trading with C#

Today, the world is transforming towards automated fashion, including manufacture, cars, marketing and logistics. Personal investment is no exception. At Alpaca, we are pushing this boundary forward so everyone can enjoy the automated investment world.

Photo by  Nikhil Mitra  on  Unsplash

List of .NET/C# Algo Trading Systems

When it comes to algo trading and automated investment, Python is one of the biggest players in the space, but many experts also use .NET/C# for its high performance and robustness. As we did some research on toolset you might look at to start your algo trading, we wanted to share this list for you.

Overall, the ecosystem has grown so much lately, and many open sources and tools are available for you at low cost, without much equipment.

  • QuantConnect

QuantConnect is one of the most popular online backtesting and live trading services, where you can learn and experiment your trading strategy to run with the real time market. The platform has been engineered in C# mainly, with additional language coverage such as python.

  • WealthLab

WealthLab is another C# platform where you can get the real time price and run your algorithm, if you have a Fidelity account.

  • NinjaTrader and MultiCharts

NinjaTrader and MultiCharts are also popular choices for different kind of assets with various broker options.

  • OpenSource Projects

In addition to these, StockSharp is an interesting open source project which is tailor for .NET algo traders and broker integrations.

You should also check out Lean which is an open source library developed by QuantConnect, who also uses this library for their flagship service, supporting multiple assets such as stocks and cryptocurrencies.

List of Data Library

  • Deedle

Deedle is probably one of the most useful libraries when it comes to algo trading. You would run some calculation using Frame and compare data, to get signals.

  • TALibraryInCSharp

TALibraryInCSharp is a great open source library that bridges TA-lib and .NET world, so that you can calculate common indicators such as moving average and RSI. Combining these libraries, you will get the power of trading tools.

  • IEX

Now the question is data to calculate those signals on, but if you are talking about US equities, you can leverage IEX’s free data API and there are libraries like IEXTradingApi that makes your life easy for getting the data instantly. 

  • Others

There are quite a bit of .NET libraries out there for proprietary data sources (e.g. for Quandl) too, so you should check it out.

Announcing Alpaca’s Official .NET Client SDK

Don’t forget about Alpaca! We are committed to providing the best experiences for many algo traders, and today we are happy to announce that our official .NET client SDK for Alpaca Trade API has been released.

Following our Python SDK, .NET SDK takes advantage of its robustness and high performance, as well as wide coverage of platforms. It is an open source project hosted in GitHub and the prebuilt package is up in NuGet. All the classes and methods are documented for IntelliSense so you can get the references right in your IDE.

Here is a snippet of how easily you can place a buy order of a share of Apple.

Alpaca Trade API covers not only retrieving account information and submitting orders, but also allows one to retrieve price and fundamentals information easily. For more details of API, please read our online documents.

Happy algo trading!

/

Data, Data, Data! 11 Great Financial Data Vendors

Hey everyone, Intern Rao here. A few weeks ago, I worked with Hitoshi and Yoshi to put together “9 Great Tools for Algo Trading”. 

Looking at the response to that article, we decided to write a follow up. Whether you’re a financial firm or an individual trader, financial data is key for putting together any good strategy. With so many vendors on the market today, many good options get lost in the noise. Here are 11 great financial data vendors. 

By Chris Lee on Unsplash

By Chris Lee on Unsplash

If your service is featured and is being misrepresented in some form, please reach out to me ASAP (through my medium page, linked at the bottom), and an immediate correction will be made.

Vendors for the Individual Investor

The following data vendors are those that see their target audience as the individual investor. While they do have larger premium plans for high volume usage, the plan options and community are structured around the individual investor. 

(1) IEX

1_RHI1pnn1T3PSJVccb1PLAw.png

IEX, which stands for the Investor’s Exchange, was founded by four former employees of the Royal Bank of Canada. Upset by the unfair conditions of today’s stock market, they looked to create an exchange that leveled the playing field for long-term investors. Their story was chronicled in the 2014 Michael Lewis Book Flash Boys: A Wall Street Revolt.

41C3LHEK5TL.jpg

IEX has a significant amount of free data, as well as multiple APIs to access data. The catch is though that all pricing data is the price of that security on IEX. That said, their coverage of securities is second-to-none, as they have information on US equities, fixed income, and ETFs. Find out more on the IEX website here

(2) Quandl

1_kP9DMfLF-DBYjFhsOMM-nw.png

If you’ve used Zipline for backtesting at some point, you’ve probably used free data from Quandl. Based out of Toronto, Quandl first started as a search engine for financial data sets. As algo trading grew as a field, Quandl started repackaging their search engine of data sets as individual bundles. 

Quandl offers a bundle with US Equities for free, which is actually the default bundle you ingest on Quantopian’s Zipline backtesting engine. Be warned that this bundle doesn’t contain any fixed income or ETFs. All other bundles are priced for a one time fee, and Quandl even serves as a market-place for third parties to sell their bundles. If you’re newer to the field, Quandl’s not a bad place to start. 

(3) Alpha Vantage

1_sPDo6TuC3m4U4aRegDE1HA.png

AlphaVantage is a community-based data vendor. In the last few years, AlphaVantage have worked hard to foster a developer community, not unlike Quantopian. The community helped build their product, as well as discuss investment strategies and other apps built on AlphaVantage data. 

AlphaVantage is a generally free resource with APIs available in multiple languages. They provide both real-time data streams and historical market data for backtesting. The API doesn’t have a daily/weekly/monthly limit, but the free subscription does have limited latency. For the serious investor with larger API calls, a premium subscription is best. 

(4) Intrinio

1_uxLVVGQvRsSvrJ5RE0tGYg.png

Intrinio was founded in 2012, with the idea of making financial data easy to access. Intrinio actually gears their product towards developers, and is currently the data provider for the backtesting engine QuantRocket

They currently offer two forms of data — streams and packets. Data streams offer a free trial, and then are subscription based depending on the stream. Packets are bundles of historical data. These bundles can be bought for a one-time payment and are priced based on the width and depth of data coverage. They also offer applications like the Intrinio screener for Excel. 

(5) Zacks

1_ccRyhV11_-uBh39AKbk0gw.png

Zacks was founded in 1985 as a quantitative ranking of potential buys based on estimated earnings. That tool Zacks Ranks was the cornerstone of what Zacks would come to offer today. 

Zacks is more than just a data vendor. It offers both historical and real-time data, as well as a variety of third party applications to use that data. Zacks also features newsletters, advice from industry experts, and a community of developers. For traders who are looking to wet their feet, Zacks may be the place to go. 

(6) Polygon

1_8l14hAgNeXHuAJHCEoDTng.png

Polygon is far newer on the data vendor scene, having been founded in 2015. They have a variety of pricing plans based on data coverage as well as latency of connection. 

Polygon provides real-time quotes and data streams, and have official clients in 10 different languages including Python, C, and Go. In a trend keeping with other data vendors of late, Polygon has done their best to foster a community of developers, which has led to the creation of community clients including Perl, .NET, and Scala. 

Disclaimer: Polygon is the data vendor of choice for Alpaca

(7) EOD Historical Data

Quirkily named EOD Historical Data was founded in May of last Year, but has established itself with a combination of accurate data and affordable prices. The Lyon, France based company offers monthly rates that range from as low as $9.99 to a high of $39.99.

Business-Facing Vendors

The vendors listed in the previous section are geared towards the individual investor, they have premium subscriptions for businesses. These following vendors are specifically geared towards firms, and are out of the price range of the average individual investor. 

(8) Xignite

1_Vg_ukrEOUsClP5BAMJomHA.png

Xignite was initially founded as a wealth management platform. When the founder Stephane Dubois realized the challenge of obtaining accurate market data, he pivoted Xignite to be come a data vendor. 

Xignite offers real time data streams, daily quotes, and historical market data. For developers, they offer multiple APIs for historical data. Their products are expensive for the individual investor, and Xignite themselves identify as business facing. Some of their more famous clients include Robinhood, Wealthfront, and StockTwits.

(9) Thomson Reuters

1_XCLqOfm9b9YiHsYFWOSA1g.png

Thomson Reuters is the merger of mass multimedia Thomson Company and the information group Reuters. Founded in London, Reuters has been providing financial information since 1951. Like Bloomberg, Reuters transitioned into information providing in the late 20th century. 

Reuter’s provides pricing data for assets in over 200 exchanges for both equities and fixed income. Their software includes not just pricing data, but research and analytical tools, along with a dedicated news stream, and a mobile interface. The full Eikon software costs $22,000 a year, but individual investors can find a stripped down version with just pricing data for $3,600. 

(10) Bloomberg

1_IhoXsTBjPi1dD7xlIDmtnw.png

Bloomberg was founded in 1981 by former New York City mayor Michael Bloomberg. Bloomberg may not have Reuter’s pedigree, but it’s quickly established itself as industry standard.

The Bloomberg Terminal packs minute by minute data, trading tools and analytics for over 300 exchanges around the world. For algo traders, the terminal subscription even offers an API to access in multiple languages. But clocking in at $24,000 a year, it’s a hard sell for the individual investor. If you do have access, whether through your firm or institution, you can’t get better than Bloomberg. 

(11) YCharts

1_K2y6MIXeKzntNCMkvej0zw.png

YCharts was founded by Ara Anjargolian and Shawn Carpenter as an attempt to compete directly with the Bloomberg Terminal. Founded in 2009, the self proclaimed financial terminal of the web balances a load of data along with a variety of useful tools and integration.

They offer Excel integrations and data visualizations of niche metrics, which is in line with their target audience; hedge fund managers and sales representatives. Both groups are accustomed to in-depth research on behalf of their clients. 

YCharts Professional clocks in at $199/month which is expensive, but only about 10% of the cost of a Bloomberg Terminal. For those interested in a stripped down subscription with only access to the simplest data metrics, YCharts offers a membership at $49/month, which is affordable for most individual investors. 

Lastly...

I hope this data vendor list was useful. If you think there are services that I missed please let me know! I always appreciate any, and all feedback.

So, good luck trading everyone!

by Rao Vinnakota

 

/

9 Great Tools for Algo Trading

In the last 5–10 years algorithmic trading, or algo trading, has gained popularity with the individual investor. The rise in popularity has been accompanied by a proliferation of tools and services, to both test and trade with algorithms. I’ve put together a list of 9 tools you should consider using for your algo trading process.

Photo by  Adrian Curiel  on  Unsplash

Photo by Adrian Curiel on Unsplash

Web Services:

The following are managed-services that you can use through web browsers, and don’t require much setup from the user. As someone who’s recently started in this field, I found it easy for new algo traders to try out.

(1) Quantopian:

1_xmqbJh3dGQIXylJTSCD7dA.png

A Boston-based crowd-sourced hedge fund, Quantopian provides an online IDE to backtest algorithms. Their platform is built with python, and all algorithms are implemented in Python. When testing algorithms, users have the option of a quick backtest, or a larger full backtest, and are provided the visual of portfolio performance.

Live-trading was discontinued in September 2017, but still provide a large range of historical data. They also have a serious community of developers, and now hold an ongoing DAILY contest with 10 winners awarded each day for a total of $5,000 per month in prize money (*updated from previously written as "periodically hold contests"). Quantopian provides capital to the winning algorithm. 

(2) QuantConnect:

1_eQopy9i2kr63RFeCFQq-sA.png

QuantConnect, is another platform that provides an IDE to both backtest and live-trade algorithmically. Their platform was built using C#, and users have the options to test algorithms in multiple languages, including both C# and Python.

QuantConnect also embraces a great community from all over the world, and provides access to equities, futures, forex and crypto trading. They offer live-trading integration with various names such as InteractiveBrokers, OANDA, and GDAX.

(3) QuantRocket:

QuantRocket is a platform that offers both backtesting and live trading with InteractiveBrokers, with live trading capabilities on forex as well as US equities. It’s specifically designed for trading with InteractiveBrokers, and sets itself apart with its flexibility.

QuantRocket supports multiple engines — its own Moonshot, as well as third party engines chosen by the user. While QuantRocket doesn’t have a traditional IDE, it is integrated well with Jupyter to produce something similar. One thing to keep in mind is that QuantRocket is not free. Pricing plans start at 19.99/month USD, with annual options.

Local Backtesting/LiveTrading Engines:

In today’s software world, you have lots more freedom if you make some effort outside of those managed-services. If you are comfortable this way, I recommend backtesting locally with these tools:

(4) Zipline/Zipline-Live:

quantopian/zipline
zipline - Zipline, a Pythonic Algorithmic Trading Librarygithub.com

Quantopian’s IDE is built on the back of Zipline, an open source backtesting engine for trading algorithms. Zipline runs locally, and can be configured to run in virtual environments and Docker containers as well. Zipline comes with all of Quantopian’s functions, but not all of its data. To balance that, users can write custom data to backtest on. Zipline also provides raw data from backtests, allowing for versatile uses of visualization.

Zipline discontinued live trading in 2017, but there is an open source project Zipline-live that works with Interactive Brokers. It has many of the same features Zipline does, and provides live trading.

(5) BackTrader:

backtrader - Backtesting / Trading
from datetime import datetime import backtrader as bt class SmaCross(bt.SignalStrategy): params = (('pfast', 10)…www.backtrader.com

Backtrader is currently one of the most popular backtesting engines available. It was built using python, and has a clean, simple, and efficient interface that runs locally (no Web Interface). One thing to keep in mind, backtrader doesn’t come with any data, but you can hook up your own market data in csv and other formats pretty easily.

Starting with release 1.5.0, BackTrader has live-trading capabilities. It’s been a popular choice with algo traders, especially after Zipline discontinued live trading.

(6) IBPy:

blampe/IbPy
IbPy - Python API for the Interactive Brokers on-line trading system.github.com

IBPy is an unaffiliated third party python wrapper for InteractiveBroker’s Trade Workstation API. Before IB started providing their official API library for python, this was the only way to connect to TWS for algorithms written in python.

IB has released an official python SDK, and this library is heading towards begin obsolete(while still being relevant for python2 users). But there still remain a significant number of live trading engines/tools that still use this library, and it’s good learning material for whoever wants to learn about implementing API’s.

While it’s good to learn about this library since it’s ubiquitous, if you are starting fresh, we recommend IB’s official python SDK.

Alpaca Trade API Python SDK is even much simpler to use!

Analytical Tools:

Back testing will output a significant amount of raw data. Some IDE’s will provide basic visualization and analysis, usually algorithm performance. If you’re looking for deeper evaluation, I recommend these tools:

(7) Pyfolio:

quantopian/pyfolio
pyfolio - Portfolio and risk analytics in Pythongithub.com

Pyfolio is another open source tool developed by Quantopian that focuses on evaluating a portfolio. What sets Pyfolio apart, is its ability to introduce degrees of uncertainty to a static set of data points, and evaluate Bayesian metrics from the user’s portfolio. The Pyfolio API offers a number of visualizations, which can be found on their GitHub repository.

(8) Alphalens:

quantopian/alphalens
alphalens - Performance analysis of predictive (alpha) stock factorsgithub.com

Alphalens is also an analysis tool from Quantopian. Unlike Pyfolio, Alphalens works well with the raw data output from Zipline, and rather than evaluate the portfolio, is performance analysis of predictive stock factors. Alphalens has its own range of visualizations found on their GitHub repository.

1_HnTL1cG_aBDoyBDyOzfkew.png

Median Daily Returns by Factor Quantile — one of the visualizations that alphalens offers

(9)TradingView:

1_Vy5CbHA0AdW7pwSmiwoZng.png

TradingView is a visualization tool with a vibrant open-source community. It’s entirely web-based, and allows users to visualize data, whether the data is the result of paper trading or algorithmic back-testing. Like Quantopian, TradingView allows users to share their results and visualizations with others in the community, and receive feedback.

(Bonus) Execution Platforms aka Broker-Dealers:

(10)InteractiveBrokers:

1_jAUoYLH-Fa3CyOtsfzsW3w.png

InteractiveBrokers is an online broker-dealer for active traders in general. They have been in the market since 1978. Algo trading isn’t IB’s focus, but multiple engines offer live trading through integration with their Trader Workstation. We’ve mentioned IB several times in this article — they’re just that good!

(11)Alpaca:

1_bhSUXCHVNP_JQmsMHklPlg.png

Finally, Alpaca! Alpaca was founded in 2015, and is an up and coming commission-free, broker-dealer designed specifically for algo trading. Alpaca also has a trade api, along with multiple open-source tools, which include a database optimized for time-series financial data known as the MarketStore.

The brokerage is scheduled to be publicly available this September (you can play around with the MarketStore right now), but if you can’t wait, head over to our website and jump on the waitlist for a chance at early access!

Alpaca | Algo Trading Commission Free with REST API
Lower your cost, Maximize your profits. Algo trading commision free.alpaca.markets

Miscellaneous Tools to Take a Look At:

  • qtpylib — another simplistic python backtesting engine
  • Multicharts — proprietary trading platform for forex and equities
  • WealthLab — desktop tool which allows C# backtesting, with live trading exclusive to Fidelity
  • Enygma Catalyst — for crypto trading
  • MetaTrader — backtesting/livetrading desktop app, de-fact in forex

I hope this quick primer on tools available right now was useful. If you liked it, please leave a clap (or two, I don’t mind). If you think there are tools that I missed, leave a comment below! I always appreciate any, and all feedback.

by Rao Vinnakota

/

Algo Trading for Dummies  - 3 Useful Tips When Storing Trade Signals (Part 2)

Handling & Storing Trading Signals Are Hard

The calculation of simple trading indicators is made easy with the use of any one of the Technical Analysis libraries available, however, the efficient handling and storage of trading signals can be one of the most complex aspects of a live trading system.

Photo by  Jeremy Thomas  on  Unsplash

Calculating Basic Indicators? No Problem

While it’s often necessary to create custom indicators and trading signals, there is still significant benefit to using a standard library such as Ta-Lib for the basics. This saves a lot of time rather than having to reimplement a set of common indicators in your language of choice. It also has the added bonus of increased processing speed as opposed to calculation done in native Python, for example.

When it comes to moving averages and other simple time-series indicators, the process is fairly self explanatory — at every time step you calculate the next numerical value which is then used as the most up-to-date signal to trade against.

(Code Snippet to read data CSV files and process into trading indicators) https://gist.github.com/yoshyoshi/73f130026c25a7dcdb9d6909b1990277

The signals themselves will be stateless in that respect — you aren’t concerned with previous signals that have been made, only the combination of indicators present at that moment. However, you may still wish to store some of the information from the indicators, if only for external analysis at a later point.

Different Story For Advanced Pattern Recognition

Meanwhile, more advanced pattern recognition cannot be handled in such a simple manner. If, for example, your strategy relies on finding divergence between indicators, its possible to get a significant performance boost by storing some past data-points from which to construct the signal at each new step, rather than having to reprocess the full set of data in the look-back period every time.

This is the trade-off between storage/ RAM efficiency and processing efficiency, with the latter also requiring greater software complexity to achieve.

How You Should Store Signals Depends On How Fast You Need It To Be

For optimal processing efficiency, you would not only store all the previously calculated signals from past time-stamps, but also the relevant information to calculate the next step in as fewer steps as possible.

While this would be completely unnecessary for any system with a polling rate above a seconds, it is exactly the kind of consideration you would have for a higher frequency strategy.

Meanwhile, a portfolio re-balancing system, or even most day-trading strategies, have all the time in the world (relatively). You could easily recalculate all the relevant signals at each time-step, which would cut down on the need for the handling of historical indicator sets.

Depending on the trading period of the system, it may also be worth using a hybrid approach to indicator and signal storage. Rather than permanently saving the data, you could calculate the full set of indicators at start-up and periodically dump and refresh the data to keep only whats going to be used in RAM.

The precise design trade-offs should considered on an individual basis, as holding more data in RAM may not be an option when running the software from lower power cloud computing instances nor, at the other end of the spectrum, would you be able to spare the seconds to recalculate everything for a market making bot.

3 Useful Tips When Storing Trade Signals

As mentioned in the part 1 of this series, there are range of different storage solutions that can be used for trading data. However, there are several best practices which apply across all:

  1. Keep indicators in a numeric or boolean format where possible for storage. For example, splitting a more complex signal set into boolean components. This particular problem caused me several issues in projects I’ve had to work on in the past.
  2. Only store what is complex or time-consuming to recalculate. If a set of signals can be calculated in time in a stateless manner, its probably easier to do so than add the design complexity of storing extra information.
  3. Plan out the flow of data through your system before you start programming anything. What market data is going to be pulled for each time-step? What will then be calculated from this and what is necessary to store? A well thought-out design will reduce complexity and hassle down the line.

Past this, common sense applies. Its probably best to store the indicators and signals in the same time-series format as, and along side, the underlying symbols they’re derived from. More complex signals, or indicators derived from multiple symbols, may even warrant their own calculation and storage process.

You could even go as far as to create a separate indicator feed script which calculates and stores everything separately from the trading bot software itself. The database could then be read by each bot as just another data feed. This not only has the benefit of keeping the system more modular, but also allowing you to create a highly optimized calculation function without the complexity of direct integration into a live system.

Whatever flavour of system you end up using, make sure to plan out the data storage and access first and foremost, before starting the rest of the design and implementation process.

By Matthew Tweed

/

Algo Trading for Dummies  -  Collecting & Storing The Market Data (Part 1)

The lifeblood of any algorithmic trading system is, of course, its data — so that’s what we’ll cover in the first two posts of the mini-series.

Photo by  Farzad Nazifi  on  Unsplash

Always Always Collect Any Live Data

For the retail trader, most platforms and brokers are broadly the same, you’ll be provided with a simple wrapper for a relatively simple REST or Websocket API. It’s usually worth modifying the provided wrapper to suit your purposes, and potentially create your own custom wrapper — however, that can be done later once you have a better understanding of the structure and requirements of your trading system.

Depending on the nature of the trading strategy, there are various types of data you may need to access and work with — OHLCV data (candlesticks), bid/ asks, and fundamental or exotic data. OHLCV is usually the easiest to get historical data for, which will be important later for back-testing of strategies. While there are some sources for tick data and historic bid/ask or orderbook snapshots, they generally come at high costs.

With this last point in mind, it’s always good to collect any live data which will be difficult or expensive to access at a later date. This can be done by setting up simple polling scripts to periodically pull and save any data that might be relevant for back-testing in the future, such as bid/ask spread. This data can provide helpful insight into the market structure, which you wouldn’t be able to track otherwise.

Alpaca Python Wrapper Lets You Start Off Quickly

The Alpaca Python Wrapper provides a simple API wrapper to begin working with to create the initial proof of concept scripts. It serves well for both downloading bulk historical data and pulling live data for quick calculations, so will need little modification to get going.

It’s also to be noted that the Alpaca Wrapper returns market data in the form of pandas Dataframes, which has slightly different syntax compared to a standard Python array or dictionary — although this is covered thoroughly in the documentation so shouldn’t be an issue.

Keeping A Local Cache Of Data

While data may be relatively quick and easy to access on the fly, via the market API, for live trading, even small delays become a serious slow down when running batches of backtesting across large time periods or multiple trading symbols. As such, it’s best to keep a local cache of data to work with. This also allows you to create consistent data samples to design and verify your algorithms against.

There are many different storage solutions available, and in most cases it will come down to what you’re most familiar with. But, we’ll explore some of the options anyway.

No Traditional RDB For Financial Data Please

Financial data is time-series, meaning that each attribute is indexed by its associated time-stamp. Depending on the volume of data-points, traditional relational databases can quickly become impractical, as in many cases it is best to treat each data column as a list rather than the database as a collection of separate records.

On top of this, a database manager can add a lot of unnecessary overhead and complexity for a simple project that will have limited scaling requirements. Sure, if you’re planning to make a backend data storage solution which will be constantly queried by dozens of trading bots for large sets of data, you’ll probably want a full specialised time-series database.

However, in most cases you’ll be able to get away with simply storing the data in CSV files — at least initially.

Cutting Down Dev Time By Using CSVs

(Code Snippet to download and store OHLCV data into a CSV) https://gist.github.com/yoshyoshi/5a35a23ac263747eabc70906fd037ff3

The use of CSVs, or another simple format, significantly cuts down on usage of a key resource — development time. Unless you know that you will absolutely need a higher speed storage solution in the future, it’s better to keep the project as simple as possible. You’re unlikely be using enough data to make local storage speed much of an issue.

Even an SQL database can easily handle the storage and querying of hundreds of thousands of lines of data. To put that in perspective, 500k lines is equivalent to the 1 minute bars for a symbol between June 2013 and June 2018 (depending on trading hours). A well optimized system which only pulls and processes the necessary data will have no problem in overheads, meaning that any storage solution should be fine. Whether than be an SQL database, NoSQL or a collection of CSV files in a folder.

Additionally, it isn’t infeasible to store the full working dataset in RAM while in use. The 500k lines of OHLCV data used just over 700MB of RAM when serialized into lists (Tested in Python with data from the Alpaca client mentioned earlier).

When it comes to the building blocks of a piece of software, its best to keep everything as simple and efficient as possible, while keeping the components suitably modular so they may be adjusted in future if the design specification of the project changes.

By Matthew Tweed

/

I am a College Student and I Built My Own Robo Advisor

I’m Rao, and I’m an intern at Alpaca working on building an open-source robo advisor. I don’t have much experience in the space, and I had to find a lot of answers. While there’s a wealth of material available on the web, very little is organized. This post is the combination of various threads and forums that I read through looking for answers.

“Yes, you an do it too!”

“Yes, you an do it too!”

What is a Robo Advisor anyway?!

Robo advisors are automated advising services that require little to no user interaction. They specialize in maintaining portfolios based on the investors chosen risk level. Btw, they were launched at the start of the financial crisis in 2008.

The actual logic for a robo advisor is straightforward.

  • Allocation” — Given a risk level, portions of capital are allocated to different positions.
  • Distance” — Over a regular time interval, the adviser scans to see if there’s a significant change in the portfolio balance, and
  • Rebalancing” — If necessary, rebalances.

Allocation:

The process of allocation is where different weights are assigned to asset classes based on the developer’s algorithm. These asset classes cover the various aspects of the market, and keep a diversified portfolio. Wealthfront sees these asset classes as:

  • US Stocks
  • Foreign Stocks
  • US Bonds
  • Foreign Bonds
  • Inflation Protection

Each of these assets is represented with ETFs, who capture a broad look at the asset class.

Distance:

Rebalancing a portfolio comes with a certain amount of overhead and cost. To avoid this, the robo advisor computes a “distance” vector. To trigger a rebalance, this distance will have to be greater than a given threshold. Most robo advisors usually have this threshold set to 5 percent.

The distance in this case, is the individual change in weight of each position. This is done by first calculating the current allocation. This is the value of a position against the total value of the portfolio. If any one position has a weight that’s 5 percent greater (current_weight/target_weight > 1.05), then the robo advisor will trigger a rebalance

Rebalancing:

Rebalancing is the act of buying and selling to return the portfolio to its target state. It’s important that all selling is done first. Once all cash available is gathered, necessary purchases are made.

To calculate how much to sell, use the target allocation of cash (weight * portfolio value), and see the number of shares that result from that cash. The difference between the number of shares currently held and the target level of shares is the amount to buy/sell.

Implementing a Robo Advisor with the Vanguard Core Series:

Now that we’ve established the various parts of what makes a robo advisor, we’ll start executing the steps to build one. We’ll build our robo advisor using Quantopian’s IDE.

When implementing our advisor, I find it easiest to do in steps. Quantopian’s IDE can only run backtests on a full algorithm. What that means, is there’s no “halfway” development. So, each step is a self-contained algorithm by itself.

Note: The implementation of Vanguard’s Core Series is using the information found here. The portfolio includes a 2% investment in a money market fund, but that isn’t available on Quantopian, so for the purposes of this tutorial, we’ll ignore it.

Risk-Based Allocation:

The first step, is to determine our universe, and assign weights using risk level. Our universe will be the Vanguard Core series:

  • VTI — Domestic Equity
  • VXUS — International Equity
  • BND — Domestic Fixed Income (Bonds)
  • BNDX — International Fixed Income (Bonds)

The risk level is actually the ratio of fixed income to equity, with equity being more volatile. So, a risk level 0 would have all capital allocated to fixed income (BND, BNDX), while a risk level of 5 would be 50/50. All 11 possible risk levels are catalogued in a dictionary, with the keys being the risk level, and values being tuples containing weight allocation:

risk_level = 5
risk_based_allocation = {0: (0,0,0.686,0.294),
                         1: (0.059,0.039,0.617,0.265),
                         2: (0.118,0.078,0.549,0.235),
                         3: (0.176,0.118,0.480,0.206),
                         4: (0.235,0.157,0.412,0.176),
                         5: (0.294,0.196,0.343,0.147), 
                         6: (0.353,0.235,0.274,0.118),
                         7: (0.412,0.274,0.206,0.088),
                         8: (0.470,0.314,0.137,0.059),
                         9: (0.529,0.353,0.069,0.029),
                         10: (0.588,0.392,0,0)}

The next step will be to implement our allocation in Quantopian. For this step, the only methods we’ll need are initialize and handle_data. The rest are superfluous.

The initialize method is the main. When a backtest is started, it automatically calls initialize. So, initialize will be the function that contains the dictionary for risk levels, as well as choosing the risk levels. The handle_data method is called automatically after initialize. It’s here that we’ll actually purchase the positions outlined in initialize.

Both initialize and handle_data have the variables context and data. Context allows the user to store global variables that get passed from method to method. Data helps the algorithm fetch different sorts of data. (Note: This is a change from Quantopian 1 where data was the object where global variables were stored).

Let’s get started. In initialize, copy and paste the following code:

context.stocks = symbols(‘VTI’, ‘VXUS’, ‘BND’, ‘BNDX’)
context.bought = False

risk_level = 5
risk_based_allocation = {0: (0,0,0.686,0.294),
                         1: (0.059,0.039,0.617,0.265),
                         2: (0.118,0.078,0.549,0.235),
                         3: (0.176,0.118,0.480,0.206),
                         4: (0.235,0.157,0.412,0.176),
                         5: (0.294,0.196,0.343,0.147),
                         6: (0.353,0.235,0.274,0.118),
                         7: (0.412,0.274,0.206,0.088),
                         8: (0.470,0.314,0.137,0.059),
                         9: (0.529,0.353,0.069,0.029),
                         10: (0.588,0.392,0,0)}
    #Saves the weights to easily access during rebalance
context.target_allocation = dict(zip(context.stocks,
                            risk_based_allocation[risk_level]))

    #To make initial purchase
context.bought = False

The variable context.stocks is a list of stock objects. The symbols function turns the strings into objects. The objects have attributes like current price, closing price, etc. Using that list, as well as the dictionary for allocation, we’ll create a second dictionary context.target_allocation. The keys are each ticker (VTI, VXUS, etc.) and the values are the ticker’s weights. This dictionary will be useful for both allocation and rebalancing.

Copy and paste the following code to create handle_data:

if not context.bought:
        for stock in context.stocks:
            amount = (context.target_allocation[stock] *
                      context.portfolio.cash) / data.current(stock,’price’)
            if (amount != 0):
                order(stock, int(amount))
                #log purchase
            log.info(“buying “ + str(int(amount)) + “ shares of “ +
                     str(stock))

         #now won’t purchase again and again
         context.bought = True

The variable context.bought refers to the value originally set to False in initialize. Since handle_data is called every time a market event occurs. So, using context.bought ensures the shares are only bought once.

To buy shares for each stock, the list of stock objects is iterated through. For each stock, the total number of shares are calculated by allocating the proper amount of capital (weight * capital), and then dividing that capital by the current price. Since we can only buy whole shares, leftover capital is added back to the cash on hand.

Lastly, for a smoother backtesting experience, all transactions are logged. The log is in the lower right hand corner. Build the current algorithm. Set the start date to June 2013 or later as the ETF BNDX was first tradeable on June 7, 2013.

Calculating Distance:

Before rebalancing a portfolio, it’s best to first calculate if the current portfolio is worth rebalancing. In more sophisticated robo advisors, this step would have a target portfolio calculated with an algorithm. For this example, our target portfolio is going to be the initial allocation defined by the risk level in initialize.

As the value of each position fluctuates, their portfolio weight may change too. The distance vector will be the change in weight of each individual position. Calculating the distance takes place in three steps:

  1. Calculating the current weights of each position
  2. Calculating the target weights (already done)
  3. Comparing the two weights

We want to check the balance daily, so we’ll go ahead and add the following function , before_trading_starts, to our code:

def before_trading_starts(context, data):
    #total value
    value = context.portfolio.portfolio_value +
                                         context.portfolio.cash
    #calculating current weights for each position
    for stock in context.stocks:
        if (context.target_allocation[stock] == 0):
            continue
        current_holdings = data.current(stock,’close’) *
                      context.portfolio.positions[stock].amount
        weight = current_holdings/value
        growth = float(weight) /
                        float(context.target_allocation[stock])
        if (growth >= 1.025 or growth <= 0.975):
            log.info(“Need to rebalance portfolio”)
            break

We first calculate total value. Since that value doesn’t change, it can be down outside the loop. Then, each position’s individual weight is calculated, and compared to the target weight (we’re using division since we’re interested in the relative growth of the position, not the absolute). If that growth exceeds the threshold (currently 2.5%) or is under the threshold (de-valued), then a rebalance is triggered. We don’t have an actual rebalance function written, so for now, we’ll simply log that a rebalance is necessary. It’s important to note the break added. Once the algorithm realizes that it’s time to rebalance, there’s no need to continue checking the rest of the stock. It creates a better best-case scenario.

However, we’re not actually done. We have to call before_trading_start, but calling it in a loop is inefficient. That’s why we’ll use the schedule_function command in initialize. Add this line of code to the end of the function block:

    schedule_function(
    func=before_trading_starts,
    date_rule=date_rules.every_day(),
    time_rule=time_rules.market_open(hours=1))

This schedules a distance calculation as the market opens every day. By computing the distance in the morning, we have the time and space necessary to execute rebalancing actions.

Rebalancing

The act of rebalancing a portfolio occurs in two steps. First, all assets that need to be sold are sold, and then all assets that need to bought are bought. Selling first makes sure that we don’t run out of cash.

The following is the code for a rebalance function:

def rebalance(context, data):
    for stock in context.stocks:
        current_weight = (data.current(stock, ‘close’) *
                          context.portfolio.positions[stock].amount) /
                          context.portfolio.portfolio_value
        target_weight = context.target_allocation[stock]
        distance = current_weight — target_weight
        if (distance > 0):
            amount = -1 * (distance * context.portfolio.portfolio_value) /
                     data.current(stock,’close’)
            if (int(amount) == 0):
                continue
            log.info(“Selling “ + str(int(amount)) + “ shares of “ +
                      str(stock))
            order(stock, int(amount))
    for stock in context.stocks:
        current_weight = (data.current(stock, ‘close’) *
                          context.portfolio.positions[stock].amount) /
                          context.portfolio.portfolio_value
        target_weight = context.target_allocation[stock]
        distance = current_weight — target_weight
        if (distance < 0):
            amount = -1 * (distance * context.portfolio.portfolio_value) /
                     data.current(stock,’close’)
            if (int(amount) == 0):
                continue
            log.info(“Buying “ + str(int(amount)) + “ shares of “ +
                      str(stock))
            order(stock, int(amount))

The selling and buying mechanics are the same. The only difference is when selling stock, you use a negative value when calling the order function. In both cases, we take the absolute difference in weight (target — current), and use that value to find the number of shares to buy or sell.

Multiple Universes

Vanguard provides several more universes beyond the core-series. Currently, we’re able to manipulate the risk level, and observe the outcome. Let’s add the functionality of also being able to select a universe.

The first and straightforward method, is to implement multiple dictionaries. Here’s how something like that would look. The CRSP series was added to our algorithm. The user now chooses both the universe and the risk level.

    core_series = symbols(‘VTI’, ‘VXUS’, ‘BND’, ‘BNDX’)
    crsp_series = symbols(‘VUG’, ‘VTV’, ‘VB’, ‘VEA’, ‘VWO’, ‘BSV’, ‘BIV’, ‘BLV’, ‘VMBS’, ‘BNDX’)
    #universe risk based allocation
    core_series_weights = {0: (0,0,0.686,0.294),
                           1: (0.059,0.039,0.617,0.265),
                           2: (0.118,0.078,0.549,0.235),
                           3: (0.176,0.118,0.480,0.206),
                           4: (0.235,0.157,0.412,0.176),
                           5: (0.294,0.196,0.343,0.147),
                           6: (0.353,0.235,0.274,0.118),
                           7: (0.412,0.274,0.206,0.088),
                           8: (0.470,0.314,0.137,0.059),
                           9: (0.529,0.353,0.069,0.029),
                           10: (0.588,0.392,0,0)}
    crsp_series_weights = {0: (0,0,0,0,0,0.273,0.14,0.123,0.15,0.294),
1: (0.024,0.027,0.008,0.03,0.009,0.245,0.126,0.111,0.135,0.265),
2: (0.048,0.054,0.016,0.061,0.017,0.218,0.112,0.099,0.12,0.235),
3: (0.072,0.082,0.022,0.091,0.027,0.191,0.098,0.086,0.105,0.206),
4: (0.096,0.109,0.03,0.122,0.035,0.164,0.084,0.074,0.09,0.176),
5: (0.120,0.136,0.038,0.152,0.044,0.126,0.07,0.062,0.075,0.147),
6: (0.143,0.163,0.047,0.182,0.053,0.109,0.056,0.049,0.06,0.118),
7: (0.167,0.190,0.055,0.213,0.061,0.082,0.042,0.037,0.045,0.088),
8: (0.191,0.217,0.062,0.243,0.071,0.055,0.028,0.024,0.030,0.059),
9: (0.215,0.245,0.069,0.274,0.079,0.027,0.014,0.013,0.015,0.029),
10: (0.239,0.272,0.077,0.304,0.088,0,0,0,0,0)}

    #set universe and risk level
    context.stocks = crsp_series
    risk_based_allocation = crsp_series_weights
    risk_level = 1

The user can use the three variables, context.stocks, risk_based_allocation, and risk_level, to set both the universe and the risk level.

What’s Next

Developing in Quantopian is a great experience — they provide lots of useful tools. But, it’s also limiting, being forced to only work in one file, etc. To build further on this robo advisor, I plan to migrate my work so far to the backend, and use Quantopian’s open source python library Zipline to run backtests. The next installment will cover all this and more!

You can find the algorithm here. Try it yourself!

by Rao Vinnakota

/

9 Most Commonly Asked Questions About MarketStore And Answers To Them

Photo by  William Stitt  on  Unsplash

Each of these articles seeks to explain the technology we build along with our Alpaca algo trading brokerage. These articles led active discussions on Reddit and Medium, and it became clear to us that there is a lot of interest and a pretty large need in the community for a timeseries database dedicated for financial market data. The database world and software engineering in general have changed so much over the last decade as we’ve seen an explosion in open source programming and databases. We are seeing some people now actively using open source to and contributing some code in the GitHub repository.

In social media and offline, we’ve been answering questions and responding to comments, but today we wanted to take the opportunity to put all the queries and responses together in one post and share it with the entire community so everyone can get a look at the responses on a single post.

Q: Does MarketStore store data in memory?

A: No. MarketStore is designed to run in a reasonable size of host without huge hardware investment. If you have lots of cash, software technology is irrelevant, but what software engineering can bring is that you can do a lot better job with cheaper hardware. MarketStore’s primary use case is to be able to store and distribute years of data at second level granularity for more than tens of thousands of series (US equities and crypto coins across exchanges can easily become this size). The data size can be a few terabytes, and it is not still very common to have this big size RAM in a commodity hardware. MarketStore instead stores everything in disk, but the on-disk format is nearly identical to the layout in the memory, and thanks to SSD evolution, MarketStore can load the data at the speed competitive to in-memory storage.

Q: How does it make sense to compare with PostgreSQL and includes DataFrame loading?

A: Even if you can store the data, offloading it from application processes, it is not useful if you cannot use it. MarketStore is mainly used in the context of AI machine learning and backtesting, and the application typically loads it into some tabular structure such as Pandas DataFrame. That is why MarketStore’s network protocol is byte sequence in MessagePack so the inefficient JSON deserialization can be avoided. The client can load the delivered byte data into memory as C array, which is what is used behind DataFrame.

Q: How is it better compared to InfluxDB?

A: We have not compared the performance with InfluxDB, but InfluxDB and other general-purpose timeseries databases use-case is as system metrics or activity log analysis. Those require more flexible data structure and don’t necessarily need specific functions such as timezone-aware aggregate. The flexibility comes with necessary overhead as tradeoffs as always, and MarketStore should be much faster and cost effective if the use case is the financial market data.

Q: Why are you comparing with PostgreSQL when Timescale should be faster?

A: You can send us the benchmark results if you have them, but in our internal experiments, Timescale is even slower than PostgreSQL compared to MarketStore. The loading time at the database server level for Timescale is 2–3x slower than PostgreSQL, since Timescale makes use of table partitioning (aka table constraints exclusion) that needs to open lots of files from disk. It will give advantage to filter a small slice of the data out of large amount of data, but it will not work better if you scan most of it. MarketStore stores the data in an optimal way on disk and reads sequentially direct to memory compared to those relational databases, so it is way faster.

Q: MarketStore can be used only for historical data but not for real-time data right?

A: There is a new feature coming soon to MarketStore that will allow streaming and realtime push on every new data write. MarketStore was originally designed to help our algo trading platform that builds trading algorithms using deep learning, and run them in the real market, and had JSON websocket streaming. The feature has been for the time being so that Marketstore can find a way to fit in larger use cases. But thankfully it is now back in as a plugin. We have been testing this with thousands of updates every few seconds and so far it is working perfectly.

Q: Why do I need this for machine learning? I can load the data from disk without a problem

A: If your training process doesn’t use much data (e.g. just daily bars from one stock), then yes probably you don’t need MarketStore for performance reasons. What we needed to do on Alpaca trading platform requires a server that is large enough to store an amount of intraday data across the entire market (can be up to terabyte range), and load the necessary series data back and forth. If you are familiar with typical machine learning training process, you can tell how the training iteration can load random data from the pool. That said, MarketStore is not just for performance, but also for the convenience to prove the uniformed way to access historical and real-time timeseries data the same way without worrying about how to manage local files etc. And the built-in data ingestor can load the data without even writing any code.

Q: Where is the installer?

A: Sorry, at the moment, we are not providing the one-click installer! But instead, we package the server process into a docker container image, so if you have docker, you can just start it in a second.

Q: Why is it open sourced?

A: Because there is a problem to be solved! MarketStore was implemented proprietary for our internal use and has been used in our production, but we have also seen the common problems affecting many people in the space. Our mission at Alpaca is to help individual investors with technology, and improve the algo trading environment, regardless of whether we give that information away to users or offer it in a premium package. This kind of product has only been accessible by financial institutions with large capital resources. But now we are making it available to anyone who is eager to try out! That’s awesome, isn’t it!?

Q: I found a bug

A: Please report it in the GitHub issue!

/

50x Faster Bitcoin Price Data Powered by MarketStore for AI Trading

In our last post “How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes”, we introduced how to set up bitcoin price data in five minutes and we got a lot of good feedback and contributions to the open source MarketStore.

Photo by  chuttersnap &nbsp;on  Unsplash

Photo by chuttersnap on Unsplash

The data speed is really important

Today, I wanted to tell you how fast MarketStore is using the same data so that you can see the performance benefit of using the awesome open source financial timeseries database.

Faster data means more backtesting and more training in machine learning

Faster data means more backtesting and more training in machine learning for our trading algorithm.  We are seeing a number of successful machine learning-based trading algos in the space, but one of the key points we learned is the data speed is really importantIt's important not just for backtesting, but also for training AI-style algorithms since it by nature requires an iterative process.

This is another post to walk you through step by step. But TL;DR, it is really fast.

Setup

structure.001.jpeg

Last time, we showed how to setup the historical daily bitcoin price data with MarketStore.

This time, we store all the minute-level historical prices using the same mechanism called background worker, but with a slightly different configuration.

 

root_directory: /project/data/mktsdb
listen_port: 5993
# timezone: "America/New_York"
log_level: info
queryable: true
stop_grace_period: 0
wal_rotate_interval: 5
enable_add: true
enable_remove: false
enable_last_known: false
triggers:
 - module: ondiskagg.so
   on: "*/1Min/OHLCV"
   config:
     destinations:
       - 5Min
       - 15Min
       - 1H
       - 1D
bgworkers:
 - module: gdaxfeeder.so
   name: GdaxFetcher
   config:
     query_start: "2016-01-01 00:00"
     base_timefame: “1Min”
     symbols:
       - BTC

Almost 2.5 years with more than 1 million bars

The difference from last time is that background worker is configured to fetch 1-minute bar data instead of 1-day bar data, starting from 2016-01-01.  That is almost 2.5 years with more than 1 million bars. You will need to keep the server up and running for a day or so to fill all the data, since GDAX’s historical price API does not allow you to fetch that many data points quickly.

Again, the data fetch worker carefully controls the data fetch speed in case the API returns “Rate Limit” error. So you just need to sleep on it.

Additional configuration here is something called “on-disk aggregate” trigger.  What it does is to aggregate 1-minute bar data for lower resolutions (here 5 minutes, 15 minutes, 1 hour, and 1 day).

Check the longer time horizon to verify the entry/exit signals

In a typical trading strategy, you will need to check the longer time horizon to verify the entry/exit signals even if you are working on the minute level. So it is a pretty important feature. You would need pretty complicated LEFT JOIN query to achieve the same time-windowed aggregate in SQL. But with MarketStore, all you need is this small section in the configuration file.

The machine we are using for this test is a typical Ubuntu virtual machine with 8 of Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 32GB RAM and SSD.

The Benchmark

Unfortunately lots of people in this space are using some sort of SQL database

We are going to have a DataFrame object in python which holds all the minute level historical price data of bitcoin since January of 2016 from the server.  We compare MarketStore and PostgreSQL.

PostgreSQL is not really meant to be the data store for this type of data, but unfortunately lots of people in this space are using some sort of SQL database for this purpose since there is no other alternative.  That’s why we built MarketStore.

The table definition of the bitcoin data in PostgreSQL side looks like this.

btc=# \d prices
              Table "public.prices"
 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 t      | timestamp without time zone |
 open   | double precision            |
 high   | double precision            |
 low    | double precision            |
 close  | double precision            |
 volume | double precision            |

The code looks like this.

# For postgres
def get_df_from_pg_one(conn, symbol):
    tbl = f'"{symbol}"'
    cur = conn.cursor()
    # order by timestamp, so the client doesn’t have to do it
    cur.execute(f"SELECT t, open, high, low, close, volume FROM {tbl} ORDER BY t")
    times = []
    opens = []
    highs = []
    lows = []
    closes = []
    volumes = []
    for t, open, high, low, close, volume in cur.fetchall():
        times.append(t)
        opens.append(open)
        highs.append(high)
        lows.append(low)
        closes.append(close)
        volumes.append(volume)

    return pd.DataFrame(dict(
        open=opens,
        high=highs,
        low=lows,
        close=closes,
        volume=volumes,
    ), index=times)

# For MarketStore
def get_df_from_mkts_one(symbol):
    params = pymkts.Params(symbol, '1Min', 'OHLCV')
    return pymkts.Client('http://localhost:6000/rpc'
                         ).query(params).first().df()

You don’t need much client code to get the DataFrame object

The input and output is basically the same, in that one symbol name is given, query the remote server over the network, and get one DataFrame.  One strong benefit of MarketStore is you don’t need much client code to get the DataFrame object since the wire protocol is designed to give an array of numbers efficiently.

The Result

First, PostgreSQL

%time df = example.get_df_from_pg_one(conn, 'prices')
CPU times: user 8.11 s, sys: 414 ms, total: 8.53 s
Wall time: 15.3 s

And MarketStore

%time df = example.get_df_from_mkts_one('BTC') 
CPU times: user 109 ms, sys: 69.5 ms, total: 192 ms Wall time: 291 ms 

Both results of course look the same like below.

In [21]: df.head()
Out[21]:
                       open    high     low   close   volume
2016-01-01 00:00:00  430.35  430.39  430.35  430.39   0.0727
2016-01-01 00:01:00  430.38  430.40  430.38  430.40   0.9478
2016-01-01 00:02:00  430.40  430.40  430.40  430.40   1.6334
2016-01-01 00:03:00  430.39  430.39  430.36  430.36  12.5663
2016-01-01 00:04:00  430.39  430.39  430.39  430.39   1.9530

 

50 times difference

A bitcoin was about $430 back then… Anyway, you can see the difference between 0.3 vs 15 seconds which is about 50 times difference. Remember, you may need to get the same data again and again for different kinds of backtesting and optimization as well as ML training.

Also you may want to query not just bitcoins but also other coins, stocks and fiat currencies, since the entire database wouldn’t fit into your main memory usually.

Scalability advantage in MarketStore

MarketStore can serve multiple symbol/timeframe in one query pretty efficiently, whereas with PostgreSQL and other relational databases you will need to query one table at a time, so there is also scalability advantage in MarketStore when you need multiple instruments.

Querying 7.7K symbols for US stocks

To give some sense of this power, here is the result of querying 7.7K symbols for US stocks done as an internal testing.

%time dfs = example.get_dfs_from_pg(symbols) 
CPU times: user 52.9 s, sys: 2.33 s, total: 55.3 s Wall time: 1min 26s 
%time dfs = example.get_dfs_from_mkts(symbols) 
CPU times: user 814 ms, sys: 313 ms, total: 1.13 s Wall time: 6.24 s

Again, the amount of data is the same, and in this case each DataFrame is not as large as the bitcoin case, yet the difference to expand to large number of instruments is significant (more than 10 times).  You can imagine in real life these two (per instrument and multi-instruments) factors multiply the data cost.

Alpaca has been using MarkStore in our production

Alpaca has been using MarkStore in our production for algo trading use cases both in our proprietary customers and our own purposes.  It is actually amazing that this software is available to everyone for free, and we leverage this technology to help our algo trading customers (early access signup is here).

Thanks for reading and enjoy algo trading!


 

/

How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes

Alpaca platform is now accepting signups for our algo trading waitlist and we do believe trading automation is the future. 

Photo by  Chris Liverani  on  Unsplash

One of the great benefits of algorithmic trading is that you can test your trading strategy against historical data.

Especially for new strategies you developed on your own, you don’t really know how it will perform without testing it against reliable data. Algo trading is not that different from software development. Today, most good software code is continuously tested. As the code evolves, developers continually test it against real use cases to make sure that alterations won’t result in future failures. You want to test your algo when you drop the very first version. But you also want to test the algo when you make changes or adjustments while the algo is running.

Here is one problem you face; data. 

While it is always good to test as many angles as possible, running a number of backtests is going to be a very data-intensive workload that requires access to enough data to have visibility into a very long history. This is why you cannot run this iteration using GDAX API directly. You need to store the data somewhere for your own purposes.

The Tutorial

So the first question to come to mind is always how to get the data and prepare it for successful backtesting. Well, today I am going to tell you how to use MarketStore to acquire a long history of Bitcoin price data for this purpose of running the most accurate backtest possible. And this setup tutorial is going to be quick. Don’t waste your time setting up this and that. All you need is:

  • docker (either on Windows, Mac or Linux),

  • a console terminal,

  • and a big cup of strong coffee! ☕

Got’em? Alright, let’s get started.

 

Architecture

Medium - bitcoin data1.png

Here is the high level picture of today’s system. We will start a MarketStore instance using docker container, and run a background worker that calls GDAX price API so that we can pull the bitcoin historical price from their endpoint quickly and make it available for backtest clients to query over HTTP.

We will start another container for the client using python anaconda with python3 image. We use the official client package named pymarkestore. You will get a DataFrame from MarketStore.

Setup MarkeStore Server

There is the official build of MarketStore docker image today publicly available in DockerHub, but first, let’s write a config file for the server.

In the github repository you can find an example config file in YAML format: https://github.com/alpacahq/marketstore/blob/master/mkts.yml but I’m putting our example here.

root_directory: /project/data/mktsdb   listen_port: 5993   log_level: info   queryable: true   stop_grace_period: 0   wal_rotate_interval: 5   enable_add: true   enable_remove: false   enable_last_known: false   bgworkers:     - module: gdaxfeeder.so       name: GdaxFetcher       config:         symbols:          - BTC          base_timeframe: "1D"          query_start: "2018-01-01 00:00"

This configures the server so that it fetches the GDAX historical price API for 1-day bars since 2018–01–01. Save this config as $PWD/mkts.yml file. The server listens on the port 5993 as default. Now let’s bring up the server.

$ docker run -v $PWD/mktsdb:/project/data/mktsdb -v $PWD/mkts.yml:/tmp/mkts.yml --net host alpacamarkets/marketstore:v2.1.1 marketstore -config /tmp/mkts.yml

The server should automatically download the docker images from DockerHub if you haven’t, and start the server process with the config. Hopefully, you will see something like this.

I0430 05:54:56.091770       1 log.go:14] Disabling "enable_last_known" feature until it is fixed... I0430 05:54:56.092200       1 log.go:14] Initializing MarketStore... I0430 05:54:56.092236       1 log.go:14] WAL Setup: initCatalog true, initWALCache true, backgroundSync true, WALBypass false: I0430 05:54:56.092340       1 log.go:14] Root Directory: /project/data/mktsdb I0430 05:54:56.097066       1 log.go:14] My WALFILE: WALFile.1525067696092950500.walfile I0430 05:54:56.097104       1 log.go:14] Found a WALFILE: WALFile.1525067686432055600.walfile, entering replay... I0430 05:54:56.100352       1 log.go:14] Beginning WAL Replay I0430 05:54:56.100725       1 log.go:14] Partial Read I0430 05:54:56.100746       1 log.go:14] Entering replay of TGData I0430 05:54:56.100762       1 log.go:14] Replay of WAL file /project/data/mktsdb/WALFile.1525067686432055600.walfile finished I0430 05:54:56.101506       1 log.go:14] Finished replay of TGData I0430 05:54:56.109380       1 plugins.go:14] InitializeTriggers I0430 05:54:56.110664       1 plugins.go:42] InitializeBgWorkers I0430 05:54:56.110742       1 log.go:14] Launching rpc data server... I0430 05:54:56.110800       1 log.go:14] Launching heartbeat service... I0430 05:54:56.110822       1 log.go:14] Enabling Query Access... I0430 05:54:56.110844       1 log.go:14] Launching tcp listener for all services...

If you see something like “Response error: Rate limit exceeded”, that’s a good sign, not a bad one, since it means the background worker successfully fetched the price data and reached to rate limit. The fetch worker will suspend for a while and restart to catch up to the current price automatically. You just need to keep it running.

Client Side

MarketStore implements JSON-RPC and MessagePack-RPC for query. MessagePack-RPC is particularly important for performance of a query on a large dataset. Thankfully, there is already python and go client library so you don’t have to implement the protocol. In this article, we use python. We start from miniconda3 image from another terminal.

$ docker run -it --rm -v $PWD/client.py:/tmp/client.py --net host continuumio/miniconda3 bash # pip install ipython pymarketstore

We have installed ipython and pymarketstore, including their dependencies. From this terminal, let’s start an ipython shell and query MarketStore data.

# ipython (base) root@hq-dev-01:/# ipython Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) Type 'copyright', 'credits' or 'license' for more information IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pymarketstore as pymkts
In [2]: param = pymkts.Params('BTC', '1D', 'OHLCV', limit=100)
In [3]: df = pymkts.Client('http://localhost:5993/rpc').query(param).first().df()
In [4]: df[-10:] Out[4]:                               Open     High      Low    Close Volume Epoch 2018-04-14 00:00:00+00:00  7893.19  8150.00  7830.00  8003.11   9209.196953 2018-04-15 00:00:00+00:00  8003.12  8392.56  8003.11  8355.25   9739.103514 2018-04-16 00:00:00+00:00  8355.24  8398.98  7905.99  8048.93  13137.432715 2018-04-17 00:00:00+00:00  8048.92  8162.50  7822.00  7892.10  10537.460361 2018-04-18 00:00:00+00:00  7892.11  8243.99  7879.80  8152.05  10673.642535 2018-04-19 00:00:00+00:00  8152.05  8300.00  8101.47  8274.00  11788.032811 2018-04-20 00:00:00+00:00  8274.00  8932.57  8216.21  8866.27  16076.648797 2018-04-21 00:00:00+00:00  8866.27  9038.87  8610.70  8915.42  11944.464063 2018-04-22 00:00:00+00:00  8915.42  9015.00  8754.01  8795.01   7684.827002 2018-04-23 00:00:00+00:00  8795.00  8991.00  8775.10  8940.00   3685.109169

Voila! You just got the daily bitcoin price in hand in the DataFrame format. Note the second line (param = …) determines which symbol and timeframe to query, with some query predicates such as the number of rows or date range to query. From here, you can do a number of things including calculating indicators such as moving average and bollinger band, or find the statistical volume anomaly using some scipy package.

Conclusion

I want to emphasize that it is very important to build a performant historical dataset to study and develop a trading algorithm, and you can do it quickly with MarketStore as we have just walked through. This article demonstrated how to work with the bitcoin prices from GDAX, but you can hook up other data sources as well pretty easily using pymarketstore’s write method. You can also write your own custom background data fetcher.

Again, the query performance is going to be critical when in comes to backtesting, since you want to iterate quickly to get the results. now You may wonder how fast MarketStore can be. I will show the lightning fast query speed with huge data set in the next post.

In the meantime, please leave any questions in the comments or ask @AlpacaHQ regarding this tutorial. Leave your email below so we can notify you when we can grant access to the full trading platform! You can also check us out at https://alpaca.markets.

Happy algo trading!

/

MarketStore, the financial time series database, is now open source

We are happy to announce MarketStore is now open source! MarketStore is a database server optimized for financial timeseries data written in pure Go, designed and developed by Alpaca. You can think of it as an extensible DataFrame service that is accessible from anywhere in your system, at higher scalability.

Read More
/