I'm a College Student, and I'm Still Building my Robo-Advisor

All code and outside instructions for setup, etc. can be found in the GitHub repository here. This is a post is a sequel to building a robo advisor in Quantopian. You can find the first part on Hackernoon here, or on Alpaca’s blog here.



Hey all, I’m Rao, an intern at Alpaca. About two weeks ago, I published an article on HackerNoon about the Robo-Advisor I’ve been working on. In the last two weeks, I’ve moved my work offline, pushing development with Zipline. In that time, I’ve picked up a few tips and tricks, and I wanted to share that with you.

Zipline — What, When, Why, Where, How?

We’ve now created a functional robo advisor on Quantopian. But, the IDE puts limits on the scale and sophistication of our work. To extend this robo advisor further, it’s best to do our future work locally with Zipline.

This shift offers advantages. We can use multiple files, configuration files, and generally develop out robo-advisor like any other python app. Being free from a single file format means that our work can be more organized, and individual functions less bloated (more on that later).

But, it’s not all roses. Developing in Zipline comes with its own challenges. But, if I can do it, you can too!

Installing zipline on your local machine isn’t as simple as `pip install zipline`. It has several dependencies, which vary slightly for each OS. You can find all necessary installation instructions here.

I made several mistakes during installation, so I recommend you run zipline in either a virtual environment, or my personal preference, in a Docker container. I’ve attached both the finished container to play with, as well as the corresponding Dockerfile. Take some time, figure out your development environment. Ready? Let’s get started.

Zipline Commands — Bundles and Execution

Open a new python file locally. Copy and paste your code from the Quantopian IDE. Now, we’re ready to go- just yanking you around. We’re not close to ready. Quantopian’s IDE is a great development environment. It takes care of background processes, allowing you to focus solely on your algorithm.

First, we need historical data to run our algorithm on. Zipline reads through data using something called bundles. To use the bundles, they need to be ingested first, which is done using the command:

$zipline ingest -b <bundle>

Zipline provides two bundles, quandl and quantopian-quandl. To ingest quandl, you’ll need to make an account, and then obtain an API key. You won’t need to for quantopian-quandl. Zipline also provides the option to create custom bundles, by writing a custom ingest function. That will be reserved for a later post.

Both Quandl and quantopian-quandl don’t provide ETF data, so I’ve provided a custom alpaca bundle. The instructions for installing this bundle locally can be found in the GitHub repository readme.

You run zipline code using the following format:

$zipline run -f <filename> -b <bundle> --start <date> --end <date>

You can see a lot of the GUI input from Quantopian is configured from the command line with Zipline.

Docker Containers:

I’ve created a docker container running python 3.6 with zipline and necessary dependencies installed. If you’re hesitant about installing zipline locally, you can pull the container from the docker hub, and experiment with the environment.

Run the image with the command:

$docker run -it alpaca/roboadvisor /bin/bash

A First Zipline Example — Buy and Hold

Like my initial start with Quantopian, I started with Zipline using a simple buy and hold strategy.

Here are a few differences to keep in mind. First, all the custom functions like order, symbols, etc. are no longer automatically included, and have to be manually imported (from zipline.api import *). Next, this is rather silly, but now that we’re not in Quantopian’s IDE, we’re no longer going to use log.info to track transactions, but print statements.

Let’s run the code with the format from earlier, and run the tests from January to June:

$zipline run -f buy-and-hold.py -b alpaca --start 2018-01-01 --end 2018-06-01

If there’re no syntax errors, Zipline will spit out a whole bunch of data. How do we know if we’re right? Scroll until you find the following table from STDOUT:

 Look at the cumulative alpha value from the last simulated trading day (2018–06–01)

Look at the cumulative alpha value from the last simulated trading day (2018–06–01)

Find the cumulative alpha value (performance against benchmark), and compare it to the cumulative alpha value when you run the same algorithm on Quantopian. They should be very similar values.

We’ve got a good idea of how Zipline works, so we can go ahead and implement the rest of our single-universe algorithm (distance computation and rebalancing). I spent an entire post talking about this implementation, so I don’t think there’s a need to re-hash it. If you haven’t read that, I recommend you do — it’s a great read! (a totally unbiased opinion)

Multiple Universes

In the previous post, I showed how to expand the algorithm to cover all possible Vanguard universes by adding multiple dictionaries in initialize function. But, if we want to implement all six vanguard universes, we would finish with a bloated ingest function, which doesn’t look that pretty.

But now we’re away from Quantopian, and here with Zipline! We can spread our code across multiple files, so let’s go ahead and do that.

To be used by the algorithm, a universe needs two sets of information. The first is the set of symbols, and the second is the weight distribution based on the risk. From the way that the symbols function works, each universe’s list of symbols needs to set in the initialization function.

But, the weights are dictionaries that aren’t bound to any zipline.api functions, so we can actually configure those in a separate file. The ConfigParser package can read in data stored in an external .ini file. More importantly, it stores the data in a similar structure to a dictionary (key/value), and is easily called and organized.

First, we need to install the package:

$pip3 install ConfigParser

With the package installed, it’s time to create the INI file. The INI file contains the information for allocation based on risk level.

INI files are separated into sections, with a new section delineated by a section title in brackets. The section titles act much like a key in dictionaries, and is a useful way to organize inputs. This file organized each individual Vanguard universe as its own section. If you’d like to add your own universe, fork the gist, create a new section with the universe name in brackets, and list the information below it.

The robo-advisor’s algorithm parses input as a dictionary. While the INI file works much like a dictionary, we’ll still need to write a function to actually translate it into a dictionary.

def section_to_dict(section):
    config = ConfigParser()
    out_dict = {}
    for key in config[section]:
        out_dict[int(key)] = ast.literal_eval(config[section][key])

The configparser is initialized, and then reads the given INI file. The section that’s read is given as user input. If we refer back to the INI file, we’ll see that the values take the form as key = value. So like a dictionary, we can iterate through the deys of an INI section. In each case, each key/value pair is added to the dictionary. (Note: If you receive a key error, it’s because the path to your config.ini file is wrong

Two more things to be aware of. In the INI file, everything is a string, but our algorithm expects an integer/tuple key-value pairs. For the key, just casting them as integers is enough, but for the tuple, we’ll have to unstring the value. For that, I found ast.literal_evalto be the function that worked best.

Now, let’s integrate this new method of retrieving weight-based allocation with our robo-advisor algorithm:

I’ve listed all the symbols for every universe in the initialization. The weights for that universe are determined by calling section_to_dict on the appropriate section of the INI file.

Now, feel free to add as many universes as you like!

What’s Next?

Going from here, I’m interested in visualizing all the output data from the Zipline backtest. With my next post, I’m hoping to explore the different visualizations available from that raw data.

by Rao Vinnakota