I Built a Go Plugin for Alpaca’s MarketStore as a College Intern

Hey all! I’m Ethan and recently started working for Alpaca as a Software Engineering Intern! For my first task, I created a Go plugin for Alpaca’s open source MarketStore server that fetches and writes Binance minute-level.

Image from iOS (2).jpg

You might be wondering — What is MarketStore? MarketStore is a database server written in Go that helps users handle large amounts of financial data. Inside of MarketStore, there are Go plugins that allow users to gather important financial and crypto data from third party sources.

For this blog post, I’ll be going over how I created the plugin from start to finish in three sections: Installing MarketStore, understanding MarketStore’s plugin structure, creating the Go plugin., and installing the Go plugin.

Experience Installing and Running MarketStore Locally

First, I set up MarketStore locally. I installed the latest version of Go and started going through the installation process outlined in MarketStore’s README. All the installation commands worked swimmingly, but when I tried to run marketstore using

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

I got this weird error:

/usr/local/go/src/fmt/print.go:597:CreateFile/go/src/github.com/alpacahq/marketstore/executor/wal.go:87open /project/data/mktsdb/WALFile.1529203211246361858.walfile: no such file or directory: Error Creating WAL File

I was super confused and couldn’t find any other examples of this error online. After checking and changing permissions in the directory, I realized my mkts.yml file configuration root_directory was incorrect. To resolve this, I changed mkts.yml from

root_directory: /project/data/mktsdb

To

root_directory: /home/ethanc/go/bin/src/github.com/alpacahq/marketstore/project/data/mktsdb

and reran

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

This time, everything worked fine and I got this output:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml
…
I0621 11:37:52.067803 27660 log.go:14] Launching heartbeat service…
I0621 11:37:52.067856 27660 log.go:14] Enabling Query Access…
I0621 11:37:52.067936 27660 log.go:14] Launching tcp listener for all services
…

To enable the gdaxfeeder plugin which grabs data from a specified cryptocurrency, I uncommented these lines in the mkts.yml file:

and reran

ethanc@ethanc-Inspiron-5559:~$ marketstore -config mkts.yml

which yielded:

…
I0621 11:44:27.248433 28089 log.go:14] Enabling Query Access…
I0621 11:44:27.248448 28089 log.go:14] Launching tcp listener for all services…
I0621 11:44:27.254118 28089 gdaxfeeder.go:123] lastTimestamp for BTC = 2017–09–01 04:59:00 +0000 UTC
I0621 11:44:27.254189 28089 gdaxfeeder.go:123] lastTimestamp for ETH = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254242 28089 gdaxfeeder.go:123] lastTimestamp for LTC = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254266 28089 gdaxfeeder.go:123] lastTimestamp for BCH = 0001–01–01 00:00:00 +0000 UTC
I0621 11:44:27.254283 28089 gdaxfeeder.go:144] Requesting BTC 2017–09–01 04:59:00 +0000 UTC — 2017–09–01 09:59:00 +0000 UTC
…

Now that I got MarketStore running, I used Jupyter notebooks and tested out the commands listed in this Alpaca tutorial and got the same results. You can read more about how to run MarketStore in MarketStore’s README, Alpaca’s tutorial, and this thread.

Understanding how MarketStore Plugins work

After installing, I wanted to understand how their MarketStore repository works and how their current Go plugins work. Before working in Alpaca, I didn’t have any experience with the Go programming language. So, I completed the Go’s “A Tour of Go” tutorial to get a general feel of the language. Having some experience with C++ and Python, I saw a lot of similarities and found that it wasn’t as difficult as I thought it would be.

Creating a MarketStore Plugin

To get started, I read the MarketStore Plugin README. To summarize at a very high level, there are two critical Go features which power plugins: Triggers and BgWorkers. You use triggers when you want your plugin to respond when certain types data are written to your MarketStore’s database. You would use BgWorkers if you want your plugin to run in the background.

I only needed to use the BgWorker feature because my plugin’s goal is to collect data outlined by the user in the mkts.yml configuration file.

To get started, I read the code from the gdaxfeeder plugin which is quite similar to what I wanted to do except that I’m trying to get and write data from the Binance exchange instead of the GDAX exchange.

I noticed that the gdaxfeeder used a GDAX Go Wrapper, which got its historical price data public endpoint. Luckily, I found a Go Wrapper for Binance created by adshao that has the endpoints which retrieves the current supported symbols as well as retrieves Open, High, Low, Close, Volume data for any timespan, duration, or symbol(s) set as the parameters.

To get started, I first created a folder called binancefeeder then created a file called binancefeeder.go inside of that. I then first tested the Go Wrapper for Binanceto see how to create a client and talk to the Binance API’s Kline endpoint to get data:

I then ran this command in my root directory:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ go run binancefeeder.go

and received the following response with Binance data:

&{1529553060000 6769.28000000 6773.91000000 6769.17000000 6771.34000000 32.95342700 1529553119999 223100.99470354 68 20.58056800 139345.00899491}
&{1529553120000 6771.33000000 6774.00000000 6769.66000000 6774.00000000 36.43794400 1529553179999 246732.39415947 93 20.42194600 138288.41850603}
…

So, it turns out that the Go Wrapper worked!

Next, I started brainstorming how I wanted to configure the Binance Go plugin. I ultimately chose symbols, queryStart, queryEnd, and baseTimeframe as my parameters since I wanted the user to query any specific symbol(s), start time, end time, and timespan (ex: 1min). Then, right after my imports, I started creating the necessary configurations and structure for BinanceFetcher for a MarketStore plugin:

The FetcherConfig’s members are what types of settings the user can configure in their configuration file (ex: mkts.yml) to start the plugin. The BinanceFetcher’’s members are similar to the FetcherConfig with the addition of the config member. This will be used in the Run function later.

After creating those structures, I started to write the background worker function. To set it up, I created the necessary variables inside the backgroundworker function and copied the recast function from the gdaxfeeder. The recast function uses Go’s Marshal function to encode the config JSON data received, then sets a variable ret to an empty interface called FetcherConfig. Then it stores the parsed JSON config data in the ret variable and returns it:

Then inside the NewBgWorker function, I started to create a function to determine and return the correct time format as well as set up the symbols, end time, start time, and time duration. If there are no symbols set, by default, the background worker retrieves all the valid cryptocurrencies and sets the symbol member to all those currencies. It also checks the given times and duration and sets them to defaults if empty. At the end, it returns the pointer to BinanceFetcher as the bgworker.BgWorker:

Then, I started creating the Run function which is implemented by BgWorker (see bgworker.go for more details). To get a better sense of how to handle errors and write modular code in Go, I read the code for plugins gdaxfeeder and polygon plugins. The Run function receives the BinanceFetcher (which is dereferenced since bgworker.BgWorker was the pointer to BinanceFetcher). Our goal for the Run function is to call the Binance API’s endpoint with the given parameters for OHLCV and retrieve the data and writes it to your MarketStore’s database.

I first created a new Binance client with no API key or secret since I’m using their API’s public endpoints.

Then, to make sure that the BinanceFetcher doesn’t make any incorrectly formatted API calls, I created a function to check the timestamp format using regex and change it to the correct one. I had to convert the user’s given timestamp to maintain consistency in the Alpaca’s utils.Timeframe which has a lot of helpful functions but has different structure members than the one’s Binance uses (ex: “1min” vs. “1m”). If the user uses an unrecognizable timestamp format, it sets the baseTimeframe value to 1 minute:

The start and end time objects are already checked in the NewBgWorker function and returns a null time.Time object if invalid. So, I only have to check if the start time is empty and set it to the default string of the current Time. The end time isn’t checked since it will be ignored if incorrect which will be explained in the later section:

Now that the BinanceFetcher checks for the validity of its parameters and sets it to defaults if its not valid, I moved onto programming a way to call the Binance API. 

To make sure we don’t overcall the Binance API and get IP banned, I used a for loop to get the data in intervals. I created a timeStart variable which is first set to the given time start and then created a timeEnd variable which is 300 times the duration plus the timeStart's time. At the beginning of each loop after the first one, the timeStart variable is set to timeEnd and the timeEnd variable is set to 300 times the duration plus the timeStart’s time:

When it reaches the end time given by the user, it simply alerts the user through glog and continues onward. Since this is a background worker, it needs to continue to work in the background. Then it writes the data retrieved to the MarketStore database. If invalid, the plugin will stop because I don’t want to write garbage values to the database:

Installing Go Plugin

To install, I simply changed back to the root directory and ran:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ make plugins

Then, to configure MarketStore to use my file, I changed my config file, mkts.yml, to the following:

Then, I ran MarketStore:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ marketstore -config mkts.yml

And got the following:

…
I0621 14:48:46.944709 6391 plugins.go:42] InitializeBgWorkers
I0621 14:48:46.944801 6391 plugins.go:45] bgWorkerSetting = &{binancefeeder.so BinanceFetcher map[base_timeframe:1Min query_start:2018–01–01 00:00 query_end:2018–01–02 00:00 symbols:[ETH]]}
I0621 14:48:46.952424 6391 log.go:14] Trying to load module from path: /home/ethanc/go/bin/bin/binancefeeder.so…
I0621 14:48:47.650619 6391 log.go:14] Success loading module /home/ethanc/go/bin/bin/binancefeeder.so.
I0621 14:48:47.651571 6391 plugins.go:51] Start running BgWorker BinanceFetcher…
I0621 14:48:47.651633 6391 log.go:14] Launching heartbeat service…
I0621 14:48:47.651679 6391 log.go:14] Enabling Query Access…
I0621 14:48:47.651749 6391 log.go:14] Launching tcp listener for all services…
I0621 14:48:47.654961 6391 binancefeeder.go:198] Requesting ETH 2018–01–01 00:00:00 +0000 UTC — 2018–01–01 05:00:00 +0000 UTC
…

Testing:

When I was editing my plugin and debugging, I often ran the binancefeeder.go file:

ethanc@ethanc-Inspiron-5559:~/go/bin/src/github.com/alpacahq/marketstore$ go run binancefeeder.go

If I ran into an issue I couldn’t resolve, I used the equivalent print function for Go (fmt). If there is an issue while running the plugin as part of MarketStore via the marketstore -config mkts.yml command, I used the glog.Infof() or glog.Errorf() function to output the corresponding error or incorrect data value.

Lastly, I copied the gdaxfeeder test go program and simply modified it for my binancefeeder test go program.

You’ve made it to the end of the blog post! Here’s the link to the Binance plugin if you want to see the complete code. If you want to see all of MarketStore’s plugins, check out this folder.

To summarize, if you want to create a Go extension for any open source repository, I would first read the existing documentation whether it is a README.md or a dedicated documentation website. Then, I would experiment around the repositories code by changing certain parts of the code and see which functions correspond with what action. Lastly, I would look over previous extensions and refactor an existing one that seems close to your plugin idea.

Thanks for reading! I hope you take a look at the MarketStore repository and test it out. If you have any questions, few free to comment below and I’ll try to answer!

Special thanks to Hitoshi, Sho, Chris, and the rest of the Alpaca’s Engineering team for their code reviews and help as well as Yoshi and Rao for providing feedback for this post.

By: Ethan Chiu

/

Algo Trading for Dummies  -  Collecting & Storing The Market Data (Part 1)

The lifeblood of any algorithmic trading system is, of course, its data — so that’s what we’ll cover in the first two posts of the mini-series.

Photo by  Farzad Nazifi  on  Unsplash

Always Always Collect Any Live Data

For the retail trader, most platforms and brokers are broadly the same, you’ll be provided with a simple wrapper for a relatively simple REST or Websocket API. It’s usually worth modifying the provided wrapper to suit your purposes, and potentially create your own custom wrapper — however, that can be done later once you have a better understanding of the structure and requirements of your trading system.

Depending on the nature of the trading strategy, there are various types of data you may need to access and work with — OHLCV data (candlesticks), bid/ asks, and fundamental or exotic data. OHLCV is usually the easiest to get historical data for, which will be important later for back-testing of strategies. While there are some sources for tick data and historic bid/ask or orderbook snapshots, they generally come at high costs.

With this last point in mind, it’s always good to collect any live data which will be difficult or expensive to access at a later date. This can be done by setting up simple polling scripts to periodically pull and save any data that might be relevant for back-testing in the future, such as bid/ask spread. This data can provide helpful insight into the market structure, which you wouldn’t be able to track otherwise.

Alpaca Python Wrapper Lets You Start Off Quickly

The Alpaca Python Wrapper provides a simple API wrapper to begin working with to create the initial proof of concept scripts. It serves well for both downloading bulk historical data and pulling live data for quick calculations, so will need little modification to get going.

It’s also to be noted that the Alpaca Wrapper returns market data in the form of pandas Dataframes, which has slightly different syntax compared to a standard Python array or dictionary — although this is covered thoroughly in the documentation so shouldn’t be an issue.

Keeping A Local Cache Of Data

While data may be relatively quick and easy to access on the fly, via the market API, for live trading, even small delays become a serious slow down when running batches of backtesting across large time periods or multiple trading symbols. As such, it’s best to keep a local cache of data to work with. This also allows you to create consistent data samples to design and verify your algorithms against.

There are many different storage solutions available, and in most cases it will come down to what you’re most familiar with. But, we’ll explore some of the options anyway.

No Traditional RDB For Financial Data Please

Financial data is time-series, meaning that each attribute is indexed by its associated time-stamp. Depending on the volume of data-points, traditional relational databases can quickly become impractical, as in many cases it is best to treat each data column as a list rather than the database as a collection of separate records.

On top of this, a database manager can add a lot of unnecessary overhead and complexity for a simple project that will have limited scaling requirements. Sure, if you’re planning to make a backend data storage solution which will be constantly queried by dozens of trading bots for large sets of data, you’ll probably want a full specialised time-series database.

However, in most cases you’ll be able to get away with simply storing the data in CSV files — at least initially.

Cutting Down Dev Time By Using CSVs

(Code Snippet to download and store OHLCV data into a CSV) https://gist.github.com/yoshyoshi/5a35a23ac263747eabc70906fd037ff3

The use of CSVs, or another simple format, significantly cuts down on usage of a key resource — development time. Unless you know that you will absolutely need a higher speed storage solution in the future, it’s better to keep the project as simple as possible. You’re unlikely be using enough data to make local storage speed much of an issue.

Even an SQL database can easily handle the storage and querying of hundreds of thousands of lines of data. To put that in perspective, 500k lines is equivalent to the 1 minute bars for a symbol between June 2013 and June 2018 (depending on trading hours). A well optimized system which only pulls and processes the necessary data will have no problem in overheads, meaning that any storage solution should be fine. Whether than be an SQL database, NoSQL or a collection of CSV files in a folder.

Additionally, it isn’t infeasible to store the full working dataset in RAM while in use. The 500k lines of OHLCV data used just over 700MB of RAM when serialized into lists (Tested in Python with data from the Alpaca client mentioned earlier).

When it comes to the building blocks of a piece of software, its best to keep everything as simple and efficient as possible, while keeping the components suitably modular so they may be adjusted in future if the design specification of the project changes.

By Matthew Tweed

/

Algo Trading News Headlines 6/7/2018

How a job invented by Goldman Sachs took over the banking industry

(news.efinancialcareers.com)

It’s not just GS though. In the past few years, strats have caught on elsewhere too. Deutsche Bank now has them. So does Morgan Stanley. So does J.P. Morgan. So does Credit Suisse. And wherever there are strats, there is strat hiring. Morgan Stanley, for example, currently has around 50 strat vacancies. Insiders say Deutsche Bank’s strats team, set up by ex-GS strat Sam Wisnia, has gone from zero to over 100 in four years

Photo by  chuttersnap  on  Unsplash

Photo by chuttersnap on Unsplash

Hedging with Machine Learning

(themarketmogul.com)

Before McDonald’s could introduce Chicken McNuggets, they had to hedge against the cost of chicken. If chicken prices rose dramatically, they would no longer be able to offer the product. After some time, a financial consultant determined that there were two costs associated with raising chickens. That is grain and water.

9 Most Commonly Asked Questions About MarketStore And Answers To Them

(blog.alpaca.markets)

In social media and offline, we’ve been answering questions and responding to comments, but today we wanted to take the opportunity to put all the queries and responses together in one post and share it with the entire community so everyone can get a look at the responses on a single post.

/

9 Most Commonly Asked Questions About MarketStore And Answers To Them

Photo by  William Stitt  on  Unsplash

Each of these articles seeks to explain the technology we build along with our Alpaca algo trading brokerage. These articles led active discussions on Reddit and Medium, and it became clear to us that there is a lot of interest and a pretty large need in the community for a timeseries database dedicated for financial market data. The database world and software engineering in general have changed so much over the last decade as we’ve seen an explosion in open source programming and databases. We are seeing some people now actively using open source to and contributing some code in the GitHub repository.

In social media and offline, we’ve been answering questions and responding to comments, but today we wanted to take the opportunity to put all the queries and responses together in one post and share it with the entire community so everyone can get a look at the responses on a single post.

Q: Does MarketStore store data in memory?

A: No. MarketStore is designed to run in a reasonable size of host without huge hardware investment. If you have lots of cash, software technology is irrelevant, but what software engineering can bring is that you can do a lot better job with cheaper hardware. MarketStore’s primary use case is to be able to store and distribute years of data at second level granularity for more than tens of thousands of series (US equities and crypto coins across exchanges can easily become this size). The data size can be a few terabytes, and it is not still very common to have this big size RAM in a commodity hardware. MarketStore instead stores everything in disk, but the on-disk format is nearly identical to the layout in the memory, and thanks to SSD evolution, MarketStore can load the data at the speed competitive to in-memory storage.

Q: How does it make sense to compare with PostgreSQL and includes DataFrame loading?

A: Even if you can store the data, offloading it from application processes, it is not useful if you cannot use it. MarketStore is mainly used in the context of AI machine learning and backtesting, and the application typically loads it into some tabular structure such as Pandas DataFrame. That is why MarketStore’s network protocol is byte sequence in MessagePack so the inefficient JSON deserialization can be avoided. The client can load the delivered byte data into memory as C array, which is what is used behind DataFrame.

Q: How is it better compared to InfluxDB?

A: We have not compared the performance with InfluxDB, but InfluxDB and other general-purpose timeseries databases use-case is as system metrics or activity log analysis. Those require more flexible data structure and don’t necessarily need specific functions such as timezone-aware aggregate. The flexibility comes with necessary overhead as tradeoffs as always, and MarketStore should be much faster and cost effective if the use case is the financial market data.

Q: Why are you comparing with PostgreSQL when Timescale should be faster?

A: You can send us the benchmark results if you have them, but in our internal experiments, Timescale is even slower than PostgreSQL compared to MarketStore. The loading time at the database server level for Timescale is 2–3x slower than PostgreSQL, since Timescale makes use of table partitioning (aka table constraints exclusion) that needs to open lots of files from disk. It will give advantage to filter a small slice of the data out of large amount of data, but it will not work better if you scan most of it. MarketStore stores the data in an optimal way on disk and reads sequentially direct to memory compared to those relational databases, so it is way faster.

Q: MarketStore can be used only for historical data but not for real-time data right?

A: There is a new feature coming soon to MarketStore that will allow streaming and realtime push on every new data write. MarketStore was originally designed to help our algo trading platform that builds trading algorithms using deep learning, and run them in the real market, and had JSON websocket streaming. The feature has been for the time being so that Marketstore can find a way to fit in larger use cases. But thankfully it is now back in as a plugin. We have been testing this with thousands of updates every few seconds and so far it is working perfectly.

Q: Why do I need this for machine learning? I can load the data from disk without a problem

A: If your training process doesn’t use much data (e.g. just daily bars from one stock), then yes probably you don’t need MarketStore for performance reasons. What we needed to do on Alpaca trading platform requires a server that is large enough to store an amount of intraday data across the entire market (can be up to terabyte range), and load the necessary series data back and forth. If you are familiar with typical machine learning training process, you can tell how the training iteration can load random data from the pool. That said, MarketStore is not just for performance, but also for the convenience to prove the uniformed way to access historical and real-time timeseries data the same way without worrying about how to manage local files etc. And the built-in data ingestor can load the data without even writing any code.

Q: Where is the installer?

A: Sorry, at the moment, we are not providing the one-click installer! But instead, we package the server process into a docker container image, so if you have docker, you can just start it in a second.

Q: Why is it open sourced?

A: Because there is a problem to be solved! MarketStore was implemented proprietary for our internal use and has been used in our production, but we have also seen the common problems affecting many people in the space. Our mission at Alpaca is to help individual investors with technology, and improve the algo trading environment, regardless of whether we give that information away to users or offer it in a premium package. This kind of product has only been accessible by financial institutions with large capital resources. But now we are making it available to anyone who is eager to try out! That’s awesome, isn’t it!?

Q: I found a bug

A: Please report it in the GitHub issue!

/

50x Faster Bitcoin Price Data Powered by MarketStore for AI Trading

In our last post “How to Setup Bitcoin Historical Price Data for Algo Trading in Five Minutes”, we introduced how to set up bitcoin price data in five minutes and we got a lot of good feedback and contributions to the open source MarketStore.

Photo by  chuttersnap  on  Unsplash

Photo by chuttersnap on Unsplash

The data speed is really important

Today, I wanted to tell you how fast MarketStore is using the same data so that you can see the performance benefit of using the awesome open source financial timeseries database.

Faster data means more backtesting and more training in machine learning

Faster data means more backtesting and more training in machine learning for our trading algorithm.  We are seeing a number of successful machine learning-based trading algos in the space, but one of the key points we learned is the data speed is really importantIt's important not just for backtesting, but also for training AI-style algorithms since it by nature requires an iterative process.

This is another post to walk you through step by step. But TL;DR, it is really fast.

Setup

structure.001.jpeg

Last time, we showed how to setup the historical daily bitcoin price data with MarketStore.

This time, we store all the minute-level historical prices using the same mechanism called background worker, but with a slightly different configuration.

 

root_directory: /project/data/mktsdb
listen_port: 5993
# timezone: "America/New_York"
log_level: info
queryable: true
stop_grace_period: 0
wal_rotate_interval: 5
enable_add: true
enable_remove: false
enable_last_known: false
triggers:
 - module: ondiskagg.so
   on: "*/1Min/OHLCV"
   config:
     destinations:
       - 5Min
       - 15Min
       - 1H
       - 1D
bgworkers:
 - module: gdaxfeeder.so
   name: GdaxFetcher
   config:
     query_start: "2016-01-01 00:00"
     base_timefame: “1Min”
     symbols:
       - BTC

Almost 2.5 years with more than 1 million bars

The difference from last time is that background worker is configured to fetch 1-minute bar data instead of 1-day bar data, starting from 2016-01-01.  That is almost 2.5 years with more than 1 million bars. You will need to keep the server up and running for a day or so to fill all the data, since GDAX’s historical price API does not allow you to fetch that many data points quickly.

Again, the data fetch worker carefully controls the data fetch speed in case the API returns “Rate Limit” error. So you just need to sleep on it.

Additional configuration here is something called “on-disk aggregate” trigger.  What it does is to aggregate 1-minute bar data for lower resolutions (here 5 minutes, 15 minutes, 1 hour, and 1 day).

Check the longer time horizon to verify the entry/exit signals

In a typical trading strategy, you will need to check the longer time horizon to verify the entry/exit signals even if you are working on the minute level. So it is a pretty important feature. You would need pretty complicated LEFT JOIN query to achieve the same time-windowed aggregate in SQL. But with MarketStore, all you need is this small section in the configuration file.

The machine we are using for this test is a typical Ubuntu virtual machine with 8 of Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz, 32GB RAM and SSD.

The Benchmark

Unfortunately lots of people in this space are using some sort of SQL database

We are going to have a DataFrame object in python which holds all the minute level historical price data of bitcoin since January of 2016 from the server.  We compare MarketStore and PostgreSQL.

PostgreSQL is not really meant to be the data store for this type of data, but unfortunately lots of people in this space are using some sort of SQL database for this purpose since there is no other alternative.  That’s why we built MarketStore.

The table definition of the bitcoin data in PostgreSQL side looks like this.

btc=# \d prices
              Table "public.prices"
 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 t      | timestamp without time zone |
 open   | double precision            |
 high   | double precision            |
 low    | double precision            |
 close  | double precision            |
 volume | double precision            |

The code looks like this.

# For postgres
def get_df_from_pg_one(conn, symbol):
    tbl = f'"{symbol}"'
    cur = conn.cursor()
    # order by timestamp, so the client doesn’t have to do it
    cur.execute(f"SELECT t, open, high, low, close, volume FROM {tbl} ORDER BY t")
    times = []
    opens = []
    highs = []
    lows = []
    closes = []
    volumes = []
    for t, open, high, low, close, volume in cur.fetchall():
        times.append(t)
        opens.append(open)
        highs.append(high)
        lows.append(low)
        closes.append(close)
        volumes.append(volume)

    return pd.DataFrame(dict(
        open=opens,
        high=highs,
        low=lows,
        close=closes,
        volume=volumes,
    ), index=times)

# For MarketStore
def get_df_from_mkts_one(symbol):
    params = pymkts.Params(symbol, '1Min', 'OHLCV')
    return pymkts.Client('http://localhost:6000/rpc'
                         ).query(params).first().df()

You don’t need much client code to get the DataFrame object

The input and output is basically the same, in that one symbol name is given, query the remote server over the network, and get one DataFrame.  One strong benefit of MarketStore is you don’t need much client code to get the DataFrame object since the wire protocol is designed to give an array of numbers efficiently.

The Result

First, PostgreSQL

%time df = example.get_df_from_pg_one(conn, 'prices')
CPU times: user 8.11 s, sys: 414 ms, total: 8.53 s
Wall time: 15.3 s

And MarketStore

%time df = example.get_df_from_mkts_one('BTC') 
CPU times: user 109 ms, sys: 69.5 ms, total: 192 ms Wall time: 291 ms 

Both results of course look the same like below.

In [21]: df.head()
Out[21]:
                       open    high     low   close   volume
2016-01-01 00:00:00  430.35  430.39  430.35  430.39   0.0727
2016-01-01 00:01:00  430.38  430.40  430.38  430.40   0.9478
2016-01-01 00:02:00  430.40  430.40  430.40  430.40   1.6334
2016-01-01 00:03:00  430.39  430.39  430.36  430.36  12.5663
2016-01-01 00:04:00  430.39  430.39  430.39  430.39   1.9530

 

50 times difference

A bitcoin was about $430 back then… Anyway, you can see the difference between 0.3 vs 15 seconds which is about 50 times difference. Remember, you may need to get the same data again and again for different kinds of backtesting and optimization as well as ML training.

Also you may want to query not just bitcoins but also other coins, stocks and fiat currencies, since the entire database wouldn’t fit into your main memory usually.

Scalability advantage in MarketStore

MarketStore can serve multiple symbol/timeframe in one query pretty efficiently, whereas with PostgreSQL and other relational databases you will need to query one table at a time, so there is also scalability advantage in MarketStore when you need multiple instruments.

Querying 7.7K symbols for US stocks

To give some sense of this power, here is the result of querying 7.7K symbols for US stocks done as an internal testing.

%time dfs = example.get_dfs_from_pg(symbols) 
CPU times: user 52.9 s, sys: 2.33 s, total: 55.3 s Wall time: 1min 26s 
%time dfs = example.get_dfs_from_mkts(symbols) 
CPU times: user 814 ms, sys: 313 ms, total: 1.13 s Wall time: 6.24 s

Again, the amount of data is the same, and in this case each DataFrame is not as large as the bitcoin case, yet the difference to expand to large number of instruments is significant (more than 10 times).  You can imagine in real life these two (per instrument and multi-instruments) factors multiply the data cost.

Alpaca has been using MarkStore in our production

Alpaca has been using MarkStore in our production for algo trading use cases both in our proprietary customers and our own purposes.  It is actually amazing that this software is available to everyone for free, and we leverage this technology to help our algo trading customers (early access signup is here).

Thanks for reading and enjoy algo trading!


 

/

MarketStore, the financial time series database, is now open source

We are happy to announce MarketStore is now open source! MarketStore is a database server optimized for financial timeseries data written in pure Go, designed and developed by Alpaca. You can think of it as an extensible DataFrame service that is accessible from anywhere in your system, at higher scalability.

Read More
/