I Built a Go Plugin for Alpaca’s MarketStore as a College Intern

Hey all! I’m Ethan and recently started working for Alpaca as a Software Engineering Intern! For my first task, I created a Go plugin for Alpaca’s open source MarketStore server that fetches and writes Binance minute-level.

Image from iOS (2).jpg

You might be wondering — What is MarketStore? MarketStore is a database server written in Go that helps users handle large amounts of financial data. Inside of MarketStore, there are Go plugins that allow users to gather important financial and crypto data from third party sources.

For this blog post, I’ll be going over how I created the plugin from start to finish in three sections: Installing MarketStore, understanding MarketStore’s plugin structure, creating the Go plugin., and installing the Go plugin.

Experience Installing and Running MarketStore Locally

First, I set up MarketStore locally. I installed the latest version of Go and started going through the installation process outlined in MarketStore’s README. All the installation commands worked swimmingly, but when I tried to run marketstore using

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

I got this weird error:

/usr/local/go/src/fmt/print.go:597:CreateFile/go/src/github.com/alpacahq/marketstore/executor/wal.go:87open /project/data/mktsdb/WALFile.1529203211246361858.walfile: no such file or directory: Error Creating WAL File

I was super confused and couldn’t find any other examples of this error online. After checking and changing permissions in the directory, I realized my mkts.yml file configuration root_directory was incorrect. To resolve this, I changed mkts.yml from

root_directory: /project/data/mktsdb


root_directory: /home/ethanc/go/bin/src/github.com/alpacahq/marketstore/project/data/mktsdb

and reran

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

This time, everything worked fine and I got this output:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml
I0621 11:37:52.067803 27660 log.go:14] Launching heartbeat service…
I0621 11:37:52.067856 27660 log.go:14] Enabling Query Access…
I0621 11:37:52.067936 27660 log.go:14] Launching tcp listener for all services

To enable the gdaxfeeder plugin which grabs data from a specified cryptocurrency, I uncommented these lines in the mkts.yml file:

and reran

ethanc@ethanc-Inspiron-5559:~$ marketstore -config mkts.yml

which yielded:

I0621 11:44:27.248433 28089 log.go:14] Enabling Query Access…
I0621 11:44:27.248448 28089 log.go:14] Launching tcp listener for all services…
I0621 11:44:27.254118 28089 gdaxfeeder.go:123] lastTimestamp for BTC = 2017–09–01 04:59:00 +0000 UTC
I0621 11:44:27.254189 28089 gdaxfeeder.go:123] lastTimestamp for ETH = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254242 28089 gdaxfeeder.go:123] lastTimestamp for LTC = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254266 28089 gdaxfeeder.go:123] lastTimestamp for BCH = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254283 28089 gdaxfeeder.go:144] Requesting BTC 2017–09–01 04:59:00 +0000 UTC — 2017–09–01 09:59:00 +0000 UTC

Now that I got MarketStore running, I used Jupyter notebooks and tested out the commands listed in this Alpaca tutorial and got the same results. You can read more about how to run MarketStore in MarketStore’s README, Alpaca’s tutorial, and this thread.

Understanding how MarketStore Plugins work

After installing, I wanted to understand how their MarketStore repository works and how their current Go plugins work. Before working in Alpaca, I didn’t have any experience with the Go programming language. So, I completed the Go’s “A Tour of Go” tutorial to get a general feel of the language. Having some experience with C++ and Python, I saw a lot of similarities and found that it wasn’t as difficult as I thought it would be.

Creating a MarketStore Plugin

To get started, I read the MarketStore Plugin README. To summarize at a very high level, there are two critical Go features which power plugins: Triggers and BgWorkers. You use triggers when you want your plugin to respond when certain types data are written to your MarketStore’s database. You would use BgWorkers if you want your plugin to run in the background.

I only needed to use the BgWorker feature because my plugin’s goal is to collect data outlined by the user in the mkts.yml configuration file.

To get started, I read the code from the gdaxfeeder plugin which is quite similar to what I wanted to do except that I’m trying to get and write data from the Binance exchange instead of the GDAX exchange.

I noticed that the gdaxfeeder used a GDAX Go Wrapper, which got its historical price data public endpoint. Luckily, I found a Go Wrapper for Binance created by adshao that has the endpoints which retrieves the current supported symbols as well as retrieves Open, High, Low, Close, Volume data for any timespan, duration, or symbol(s) set as the parameters.

To get started, I first created a folder called binancefeeder then created a file called binancefeeder.go inside of that. I then first tested the Go Wrapper for Binanceto see how to create a client and talk to the Binance API’s Kline endpoint to get data:

I then ran this command in my root directory:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ go run binancefeeder.go

and received the following response with Binance data:

&{1529553060000 6769.28000000 6773.91000000 6769.17000000 6771.34000000 32.95342700 1529553119999 223100.99470354 68 20.58056800 139345.00899491}
&{1529553120000 6771.33000000 6774.00000000 6769.66000000 6774.00000000 36.43794400 1529553179999 246732.39415947 93 20.42194600 138288.41850603}

So, it turns out that the Go Wrapper worked!

Next, I started brainstorming how I wanted to configure the Binance Go plugin. I ultimately chose symbols, queryStart, queryEnd, and baseTimeframe as my parameters since I wanted the user to query any specific symbol(s), start time, end time, and timespan (ex: 1min). Then, right after my imports, I started creating the necessary configurations and structure for BinanceFetcher for a MarketStore plugin:

The FetcherConfig’s members are what types of settings the user can configure in their configuration file (ex: mkts.yml) to start the plugin. The BinanceFetcher’’s members are similar to the FetcherConfig with the addition of the config member. This will be used in the Run function later.

After creating those structures, I started to write the background worker function. To set it up, I created the necessary variables inside the backgroundworker function and copied the recast function from the gdaxfeeder. The recast function uses Go’s Marshal function to encode the config JSON data received, then sets a variable ret to an empty interface called FetcherConfig. Then it stores the parsed JSON config data in the ret variable and returns it:

Then inside the NewBgWorker function, I started to create a function to determine and return the correct time format as well as set up the symbols, end time, start time, and time duration. If there are no symbols set, by default, the background worker retrieves all the valid cryptocurrencies and sets the symbol member to all those currencies. It also checks the given times and duration and sets them to defaults if empty. At the end, it returns the pointer to BinanceFetcher as the bgworker.BgWorker:

Then, I started creating the Run function which is implemented by BgWorker (see bgworker.go for more details). To get a better sense of how to handle errors and write modular code in Go, I read the code for plugins gdaxfeeder and polygon plugins. The Run function receives the BinanceFetcher (which is dereferenced since bgworker.BgWorker was the pointer to BinanceFetcher). Our goal for the Run function is to call the Binance API’s endpoint with the given parameters for OHLCV and retrieve the data and writes it to your MarketStore’s database.

I first created a new Binance client with no API key or secret since I’m using their API’s public endpoints.

Then, to make sure that the BinanceFetcher doesn’t make any incorrectly formatted API calls, I created a function to check the timestamp format using regex and change it to the correct one. I had to convert the user’s given timestamp to maintain consistency in the Alpaca’s utils.Timeframe which has a lot of helpful functions but has different structure members than the one’s Binance uses (ex: “1min” vs. “1m”). If the user uses an unrecognizable timestamp format, it sets the baseTimeframe value to 1 minute:

The start and end time objects are already checked in the NewBgWorker function and returns a null time.Time object if invalid. So, I only have to check if the start time is empty and set it to the default string of the current Time. The end time isn’t checked since it will be ignored if incorrect which will be explained in the later section:

Now that the BinanceFetcher checks for the validity of its parameters and sets it to defaults if its not valid, I moved onto programming a way to call the Binance API. 

To make sure we don’t overcall the Binance API and get IP banned, I used a for loop to get the data in intervals. I created a timeStart variable which is first set to the given time start and then created a timeEnd variable which is 300 times the duration plus the timeStart's time. At the beginning of each loop after the first one, the timeStart variable is set to timeEnd and the timeEnd variable is set to 300 times the duration plus the timeStart’s time:

When it reaches the end time given by the user, it simply alerts the user through glog and continues onward. Since this is a background worker, it needs to continue to work in the background. Then it writes the data retrieved to the MarketStore database. If invalid, the plugin will stop because I don’t want to write garbage values to the database:

Installing Go Plugin

To install, I simply changed back to the root directory and ran:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ make plugins

Then, to configure MarketStore to use my file, I changed my config file, mkts.yml, to the following:

Then, I ran MarketStore:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

And got the following:

I0621 14:48:46.944709 6391 plugins.go:42] InitializeBgWorkers
I0621 14:48:46.944801 6391 plugins.go:45] bgWorkerSetting = &{binancefeeder.so BinanceFetcher map[base_timeframe:1Min query_start:2018–01–01 00:00 query_end:2018–01–02 00:00 symbols:[ETH]]}
I0621 14:48:46.952424 6391 log.go:14] Trying to load module from path: /home/ethanc/go/bin/bin/binancefeeder.so…
I0621 14:48:47.650619 6391 log.go:14] Success loading module /home/ethanc/go/bin/bin/binancefeeder.so.
I0621 14:48:47.651571 6391 plugins.go:51] Start running BgWorker BinanceFetcher…
I0621 14:48:47.651633 6391 log.go:14] Launching heartbeat service…
I0621 14:48:47.651679 6391 log.go:14] Enabling Query Access…
I0621 14:48:47.651749 6391 log.go:14] Launching tcp listener for all services…
I0621 14:48:47.654961 6391 binancefeeder.go:198] Requesting ETH 2018–01–01 00:00:00 +0000 UTC — 2018–01–01 05:00:00 +0000 UTC


When I was editing my plugin and debugging, I often ran the binancefeeder.go file:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ go run binancefeeder.go

If I ran into an issue I couldn’t resolve, I used the equivalent print function for Go (fmt). If there is an issue while running the plugin as part of MarketStore via the marketstore -config mkts.yml command, I used the glog.Infof() or glog.Errorf() function to output the corresponding error or incorrect data value.

Lastly, I copied the gdaxfeeder test go program and simply modified it for my binancefeeder test go program.

You’ve made it to the end of the blog post! Here’s the link to the Binance plugin if you want to see the complete code. If you want to see all of MarketStore’s plugins, check out this folder.

To summarize, if you want to create a Go extension for any open source repository, I would first read the existing documentation whether it is a README.md or a dedicated documentation website. Then, I would experiment around the repositories code by changing certain parts of the code and see which functions correspond with what action. Lastly, I would look over previous extensions and refactor an existing one that seems close to your plugin idea.

Thanks for reading! I hope you take a look at the MarketStore repository and test it out. If you have any questions, few free to comment below and I’ll try to answer!

Special thanks to Hitoshi, Sho, Chris, and the rest of the Alpaca’s Engineering team for their code reviews and help as well as Yoshi and Rao for providing feedback for this post.

By: Ethan Chiu


9 Most Commonly Asked Questions About MarketStore And Answers To Them

Photo by  William Stitt  on  Unsplash

Each of these articles seeks to explain the technology we build along with our Alpaca algo trading brokerage. These articles led active discussions on Reddit and Medium, and it became clear to us that there is a lot of interest and a pretty large need in the community for a timeseries database dedicated for financial market data. The database world and software engineering in general have changed so much over the last decade as we’ve seen an explosion in open source programming and databases. We are seeing some people now actively using open source to and contributing some code in the GitHub repository.

In social media and offline, we’ve been answering questions and responding to comments, but today we wanted to take the opportunity to put all the queries and responses together in one post and share it with the entire community so everyone can get a look at the responses on a single post.

Q: Does MarketStore store data in memory?

A: No. MarketStore is designed to run in a reasonable size of host without huge hardware investment. If you have lots of cash, software technology is irrelevant, but what software engineering can bring is that you can do a lot better job with cheaper hardware. MarketStore’s primary use case is to be able to store and distribute years of data at second level granularity for more than tens of thousands of series (US equities and crypto coins across exchanges can easily become this size). The data size can be a few terabytes, and it is not still very common to have this big size RAM in a commodity hardware. MarketStore instead stores everything in disk, but the on-disk format is nearly identical to the layout in the memory, and thanks to SSD evolution, MarketStore can load the data at the speed competitive to in-memory storage.

Q: How does it make sense to compare with PostgreSQL and includes DataFrame loading?

A: Even if you can store the data, offloading it from application processes, it is not useful if you cannot use it. MarketStore is mainly used in the context of AI machine learning and backtesting, and the application typically loads it into some tabular structure such as Pandas DataFrame. That is why MarketStore’s network protocol is byte sequence in MessagePack so the inefficient JSON deserialization can be avoided. The client can load the delivered byte data into memory as C array, which is what is used behind DataFrame.

Q: How is it better compared to InfluxDB?

A: We have not compared the performance with InfluxDB, but InfluxDB and other general-purpose timeseries databases use-case is as system metrics or activity log analysis. Those require more flexible data structure and don’t necessarily need specific functions such as timezone-aware aggregate. The flexibility comes with necessary overhead as tradeoffs as always, and MarketStore should be much faster and cost effective if the use case is the financial market data.

Q: Why are you comparing with PostgreSQL when Timescale should be faster?

A: You can send us the benchmark results if you have them, but in our internal experiments, Timescale is even slower than PostgreSQL compared to MarketStore. The loading time at the database server level for Timescale is 2–3x slower than PostgreSQL, since Timescale makes use of table partitioning (aka table constraints exclusion) that needs to open lots of files from disk. It will give advantage to filter a small slice of the data out of large amount of data, but it will not work better if you scan most of it. MarketStore stores the data in an optimal way on disk and reads sequentially direct to memory compared to those relational databases, so it is way faster.

Q: MarketStore can be used only for historical data but not for real-time data right?

A: There is a new feature coming soon to MarketStore that will allow streaming and realtime push on every new data write. MarketStore was originally designed to help our algo trading platform that builds trading algorithms using deep learning, and run them in the real market, and had JSON websocket streaming. The feature has been for the time being so that Marketstore can find a way to fit in larger use cases. But thankfully it is now back in as a plugin. We have been testing this with thousands of updates every few seconds and so far it is working perfectly.

Q: Why do I need this for machine learning? I can load the data from disk without a problem

A: If your training process doesn’t use much data (e.g. just daily bars from one stock), then yes probably you don’t need MarketStore for performance reasons. What we needed to do on Alpaca trading platform requires a server that is large enough to store an amount of intraday data across the entire market (can be up to terabyte range), and load the necessary series data back and forth. If you are familiar with typical machine learning training process, you can tell how the training iteration can load random data from the pool. That said, MarketStore is not just for performance, but also for the convenience to prove the uniformed way to access historical and real-time timeseries data the same way without worrying about how to manage local files etc. And the built-in data ingestor can load the data without even writing any code.

Q: Where is the installer?

A: Sorry, at the moment, we are not providing the one-click installer! But instead, we package the server process into a docker container image, so if you have docker, you can just start it in a second.

Q: Why is it open sourced?

A: Because there is a problem to be solved! MarketStore was implemented proprietary for our internal use and has been used in our production, but we have also seen the common problems affecting many people in the space. Our mission at Alpaca is to help individual investors with technology, and improve the algo trading environment, regardless of whether we give that information away to users or offer it in a premium package. This kind of product has only been accessible by financial institutions with large capital resources. But now we are making it available to anyone who is eager to try out! That’s awesome, isn’t it!?

Q: I found a bug

A: Please report it in the GitHub issue!


50x Faster Bitcoin Price Data Powered by MarketStore for AI Trading

In our last post “How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes”, we introduced how to set up bitcoin price data in five minutes and we got a lot of good feedback and contributions to the open source MarketStore.

Photo by  chuttersnap  on  Unsplash

Photo by chuttersnap on Unsplash

The data speed is really important

Today, I wanted to tell you how fast MarketStore is using the same data so that you can see the performance benefit of using the awesome open source financial timeseries database.

Faster data means more backtesting and more training in machine learning

Faster data means more backtesting and more training in machine learning for our trading algorithm.  We are seeing a number of successful machine learning-based trading algos in the space, but one of the key points we learned is the data speed is really importantIt's important not just for backtesting, but also for training AI-style algorithms since it by nature requires an iterative process.

This is another post to walk you through step by step. But TL;DR, it is really fast.



Last time, we showed how to setup the historical daily bitcoin price data with MarketStore.

This time, we store all the minute-level historical prices using the same mechanism called background worker, but with a slightly different configuration.


root_directory: /project/data/mktsdb
listen_port: 5993
# timezone: "America/New_York"
log_level: info
queryable: true
stop_grace_period: 0
wal_rotate_interval: 5
enable_add: true
enable_remove: false
enable_last_known: false
 - module: ondiskagg.so
   on: "*/1Min/OHLCV"
       - 5Min
       - 15Min
       - 1H
       - 1D
 - module: gdaxfeeder.so
   name: GdaxFetcher
     query_start: "2016-01-01 00:00"
     base_timefame: “1Min”
       - BTC

Almost 2.5 years with more than 1 million bars

The difference from last time is that background worker is configured to fetch 1-minute bar data instead of 1-day bar data, starting from 2016-01-01.  That is almost 2.5 years with more than 1 million bars. You will need to keep the server up and running for a day or so to fill all the data, since GDAX’s historical price API does not allow you to fetch that many data points quickly.

Again, the data fetch worker carefully controls the data fetch speed in case the API returns “Rate Limit” error. So you just need to sleep on it.

Additional configuration here is something called “on-disk aggregate” trigger.  What it does is to aggregate 1-minute bar data for lower resolutions (here 5 minutes, 15 minutes, 1 hour, and 1 day).

Check the longer time horizon to verify the entry/exit signals

In a typical trading strategy, you will need to check the longer time horizon to verify the entry/exit signals even if you are working on the minute level. So it is a pretty important feature. You would need pretty complicated LEFT JOIN query to achieve the same time-windowed aggregate in SQL. But with MarketStore, all you need is this small section in the configuration file.

The machine we are using for this test is a typical Ubuntu virtual machine with 8 of Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 32GB RAM and SSD.

The Benchmark

Unfortunately lots of people in this space are using some sort of SQL database

We are going to have a DataFrame object in python which holds all the minute level historical price data of bitcoin since January of 2016 from the server.  We compare MarketStore and PostgreSQL.

PostgreSQL is not really meant to be the data store for this type of data, but unfortunately lots of people in this space are using some sort of SQL database for this purpose since there is no other alternative.  That’s why we built MarketStore.

The table definition of the bitcoin data in PostgreSQL side looks like this.

btc=# \d prices
              Table "public.prices"
 Column |            Type             | Modifiers
 t      | timestamp without time zone |
 open   | double precision            |
 high   | double precision            |
 low    | double precision            |
 close  | double precision            |
 volume | double precision            |

The code looks like this.

# For postgres
def get_df_from_pg_one(conn, symbol):
    tbl = f'"{symbol}"'
    cur = conn.cursor()
    # order by timestamp, so the client doesn’t have to do it
    cur.execute(f"SELECT t, open, high, low, close, volume FROM {tbl} ORDER BY t")
    times = []
    opens = []
    highs = []
    lows = []
    closes = []
    volumes = []
    for t, open, high, low, close, volume in cur.fetchall():

    return pd.DataFrame(dict(
    ), index=times)

# For MarketStore
def get_df_from_mkts_one(symbol):
    params = pymkts.Params(symbol, '1Min', 'OHLCV')
    return pymkts.Client('http://localhost:6000/rpc'

You don’t need much client code to get the DataFrame object

The input and output is basically the same, in that one symbol name is given, query the remote server over the network, and get one DataFrame.  One strong benefit of MarketStore is you don’t need much client code to get the DataFrame object since the wire protocol is designed to give an array of numbers efficiently.

The Result

First, PostgreSQL

%time df = example.get_df_from_pg_one(conn, 'prices')
CPU times: user 8.11 s, sys: 414 ms, total: 8.53 s
Wall time: 15.3 s

And MarketStore

%time df = example.get_df_from_mkts_one('BTC') 
CPU times: user 109 ms, sys: 69.5 ms, total: 192 ms Wall time: 291 ms 

Both results of course look the same like below.

In [21]: df.head()
                       open    high     low   close   volume
2016-01-01 00:00:00  430.35  430.39  430.35  430.39   0.0727
2016-01-01 00:01:00  430.38  430.40  430.38  430.40   0.9478
2016-01-01 00:02:00  430.40  430.40  430.40  430.40   1.6334
2016-01-01 00:03:00  430.39  430.39  430.36  430.36  12.5663
2016-01-01 00:04:00  430.39  430.39  430.39  430.39   1.9530


50 times difference

A bitcoin was about $430 back then… Anyway, you can see the difference between 0.3 vs 15 seconds which is about 50 times difference. Remember, you may need to get the same data again and again for different kinds of backtesting and optimization as well as ML training.

Also you may want to query not just bitcoins but also other coins, stocks and fiat currencies, since the entire database wouldn’t fit into your main memory usually.

Scalability advantage in MarketStore

MarketStore can serve multiple symbol/timeframe in one query pretty efficiently, whereas with PostgreSQL and other relational databases you will need to query one table at a time, so there is also scalability advantage in MarketStore when you need multiple instruments.

Querying 7.7K symbols for US stocks

To give some sense of this power, here is the result of querying 7.7K symbols for US stocks done as an internal testing.

%time dfs = example.get_dfs_from_pg(symbols) 
CPU times: user 52.9 s, sys: 2.33 s, total: 55.3 s Wall time: 1min 26s 
%time dfs = example.get_dfs_from_mkts(symbols) 
CPU times: user 814 ms, sys: 313 ms, total: 1.13 s Wall time: 6.24 s

Again, the amount of data is the same, and in this case each DataFrame is not as large as the bitcoin case, yet the difference to expand to large number of instruments is significant (more than 10 times).  You can imagine in real life these two (per instrument and multi-instruments) factors multiply the data cost.

Alpaca has been using MarkStore in our production

Alpaca has been using MarkStore in our production for algo trading use cases both in our proprietary customers and our own purposes.  It is actually amazing that this software is available to everyone for free, and we leverage this technology to help our algo trading customers (early access signup is here).

Thanks for reading and enjoy algo trading!



How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes

Alpaca platform is now accepting signups for our algo trading waitlist and we do believe trading automation is the future. 

Photo by  Chris Liverani  on  Unsplash

One of the great benefits of algorithmic trading is that you can test your trading strategy against historical data.

Especially for new strategies you developed on your own, you don’t really know how it will perform without testing it against reliable data. Algo trading is not that different from software development. Today, most good software code is continuously tested. As the code evolves, developers continually test it against real use cases to make sure that alterations won’t result in future failures. You want to test your algo when you drop the very first version. But you also want to test the algo when you make changes or adjustments while the algo is running.

Here is one problem you face; data. 

While it is always good to test as many angles as possible, running a number of backtests is going to be a very data-intensive workload that requires access to enough data to have visibility into a very long history. This is why you cannot run this iteration using GDAX API directly. You need to store the data somewhere for your own purposes.

The Tutorial

So the first question to come to mind is always how to get the data and prepare it for successful backtesting. Well, today I am going to tell you how to use MarketStore to acquire a long history of Bitcoin price data for this purpose of running the most accurate backtest possible. And this setup tutorial is going to be quick. Don’t waste your time setting up this and that. All you need is:

  • docker (either on Windows, Mac or Linux),

  • a console terminal,

  • and a big cup of strong coffee! ☕

Got’em? Alright, let’s get started.



Medium - bitcoin data1.png

Here is the high level picture of today’s system. We will start a MarketStore instance using docker container, and run a background worker that calls GDAX price API so that we can pull the bitcoin historical price from their endpoint quickly and make it available for backtest clients to query over HTTP.

We will start another container for the client using python anaconda with python3 image. We use the official client package named pymarkestore. You will get a DataFrame from MarketStore.

Setup MarkeStore Server

There is the official build of MarketStore docker image today publicly available in DockerHub, but first, let’s write a config file for the server.

In the github repository you can find an example config file in YAML format: https://github.com/alpacahq/marketstore/blob/master/mkts.yml but I’m putting our example here.

root_directory: /project/data/mktsdb   listen_port: 5993   log_level: info   queryable: true   stop_grace_period: 0   wal_rotate_interval: 5   enable_add: true   enable_remove: false   enable_last_known: false   bgworkers:     - module: gdaxfeeder.so       name: GdaxFetcher       config:         symbols:          - BTC          base_timeframe: "1D"          query_start: "2018-01-01 00:00"

This configures the server so that it fetches the GDAX historical price API for 1-day bars since 2018–01–01. Save this config as $PWD/mkts.yml file. The server listens on the port 5993 as default. Now let’s bring up the server.

$ docker run -v $PWD/mktsdb:/project/data/mktsdb -v $PWD/mkts.yml:/tmp/mkts.yml --net host alpacamarkets/marketstore:v2.1.1 marketstore -config /tmp/mkts.yml

The server should automatically download the docker images from DockerHub if you haven’t, and start the server process with the config. Hopefully, you will see something like this.

I0430 05:54:56.091770       1 log.go:14] Disabling "enable_last_known" feature until it is fixed... I0430 05:54:56.092200       1 log.go:14] Initializing MarketStore... I0430 05:54:56.092236       1 log.go:14] WAL Setup: initCatalog true, initWALCache true, backgroundSync true, WALBypass false: I0430 05:54:56.092340       1 log.go:14] Root Directory: /project/data/mktsdb I0430 05:54:56.097066       1 log.go:14] My WALFILE: WALFile.1525067696092950500.walfile I0430 05:54:56.097104       1 log.go:14] Found a WALFILE: WALFile.1525067686432055600.walfile, entering replay... I0430 05:54:56.100352       1 log.go:14] Beginning WAL Replay I0430 05:54:56.100725       1 log.go:14] Partial Read I0430 05:54:56.100746       1 log.go:14] Entering replay of TGData I0430 05:54:56.100762       1 log.go:14] Replay of WAL file /project/data/mktsdb/WALFile.1525067686432055600.walfile finished I0430 05:54:56.101506       1 log.go:14] Finished replay of TGData I0430 05:54:56.109380       1 plugins.go:14] InitializeTriggers I0430 05:54:56.110664       1 plugins.go:42] InitializeBgWorkers I0430 05:54:56.110742       1 log.go:14] Launching rpc data server... I0430 05:54:56.110800       1 log.go:14] Launching heartbeat service... I0430 05:54:56.110822       1 log.go:14] Enabling Query Access... I0430 05:54:56.110844       1 log.go:14] Launching tcp listener for all services...

If you see something like “Response error: Rate limit exceeded”, that’s a good sign, not a bad one, since it means the background worker successfully fetched the price data and reached to rate limit. The fetch worker will suspend for a while and restart to catch up to the current price automatically. You just need to keep it running.

Client Side

MarketStore implements JSON-RPC and MessagePack-RPC for query. MessagePack-RPC is particularly important for performance of a query on a large dataset. Thankfully, there is already python and go client library so you don’t have to implement the protocol. In this article, we use python. We start from miniconda3 image from another terminal.

$ docker run -it --rm -v $PWD/client.py:/tmp/client.py --net host continuumio/miniconda3 bash # pip install ipython pymarketstore

We have installed ipython and pymarketstore, including their dependencies. From this terminal, let’s start an ipython shell and query MarketStore data.

# ipython (base) root@hq-dev-01:/# ipython Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) Type 'copyright', 'credits' or 'license' for more information IPython 6.3.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pymarketstore as pymkts
In [2]: param = pymkts.Params('BTC', '1D', 'OHLCV', limit=100)
In [3]: df = pymkts.Client('http://localhost:5993/rpc').query(param).first().df()
In [4]: df[-10:] Out[4]:                               Open     High      Low    Close Volume Epoch 2018-04-14 00:00:00+00:00  7893.19  8150.00  7830.00  8003.11   9209.196953 2018-04-15 00:00:00+00:00  8003.12  8392.56  8003.11  8355.25   9739.103514 2018-04-16 00:00:00+00:00  8355.24  8398.98  7905.99  8048.93  13137.432715 2018-04-17 00:00:00+00:00  8048.92  8162.50  7822.00  7892.10  10537.460361 2018-04-18 00:00:00+00:00  7892.11  8243.99  7879.80  8152.05  10673.642535 2018-04-19 00:00:00+00:00  8152.05  8300.00  8101.47  8274.00  11788.032811 2018-04-20 00:00:00+00:00  8274.00  8932.57  8216.21  8866.27  16076.648797 2018-04-21 00:00:00+00:00  8866.27  9038.87  8610.70  8915.42  11944.464063 2018-04-22 00:00:00+00:00  8915.42  9015.00  8754.01  8795.01   7684.827002 2018-04-23 00:00:00+00:00  8795.00  8991.00  8775.10  8940.00   3685.109169

Voila! You just got the daily bitcoin price in hand in the DataFrame format. Note the second line (param = …) determines which symbol and timeframe to query, with some query predicates such as the number of rows or date range to query. From here, you can do a number of things including calculating indicators such as moving average and bollinger band, or find the statistical volume anomaly using some scipy package.


I want to emphasize that it is very important to build a performant historical dataset to study and develop a trading algorithm, and you can do it quickly with MarketStore as we have just walked through. This article demonstrated how to work with the bitcoin prices from GDAX, but you can hook up other data sources as well pretty easily using pymarketstore’s write method. You can also write your own custom background data fetcher.

Again, the query performance is going to be critical when in comes to backtesting, since you want to iterate quickly to get the results. now You may wonder how fast MarketStore can be. I will show the lightning fast query speed with huge data set in the next post.

In the meantime, please leave any questions in the comments or ask @AlpacaHQ regarding this tutorial. Leave your email below so we can notify you when we can grant access to the full trading platform! You can also check us out at https://alpaca.markets.

Happy algo trading!


MarketStore, the financial time series database, is now open source

We are happy to announce MarketStore is now open source! MarketStore is a database server optimized for financial timeseries data written in pure Go, designed and developed by Alpaca. You can think of it as an extensible DataFrame service that is accessible from anywhere in your system, at higher scalability.

Read More