Trading strategies based on order book

Project proposal

Motivation

In the last few years, electronic limit order books, which collect incoming limit orders and automatically match market orders against the best available limit order have been introduced by almost all major stock exchanges. The introduction of limit order books has significantly changed trading strategies as the speed of trading has increased dramatically and traders have the choice between different order types, which automatically imposes the question which of them should be used and under which conditions. This represents a large amount of electronic financial data that can be stored and processed in order to exploit underlying patterns. Financial institutions are using this data to create advantage for them on the market. One of the applications is automated trading strategies that use these patterns to trade with competitive edge.     

 

Objectives

Analyze historical limit order book data and try to find patterns, that can be reused in creating future algorithmic trading strategies or do reverse engineering. In order to achieve what we need, we are going to use genetic algorithms for different trading strategies.

Milestones

  1. Applying for LOBSTER/TradingPhysics NASDAQ's Historical TotalView-ITCH. Analysis of available sample data sets  (7-10 days)

  2. Training + System Engine (parser, analyzer, processor of data) basics  (4 weeks).

               2.1 Genetic Algorithms (Vidor’s description)

               2.2 Specify interfaces and needs of our System Engine

               2.3 Starting the implementation of our System Engine

  1. Completion of the System Engine in order to test different algorithms/strategies (1 week)

     4.   Implementation of algorithms and strategies on simulations with historical data (1 week)

   4.1 Starting the implementation of different trading algorithms

   4.2 Starting the implementation of strategies base on trading algorithms

     5. Optimizing, testing and evaluation of final system (1 week)

   5.1 Testing and optimizing of trading algorithms

   5.2 Evaluation of trading strategies results and final system

    

Methods

 

We will obtain the data in the following way:

 

LOBSTER offers limit order book data derived from NASDAQ's Historical TotalView-ITCH files for academic research only.(https://lobster.wiwi.hu-berlin.de/info/help_faq_general.php)

 

How to join LOBSTER: https://lobster.wiwi.hu-berlin.de/info/HowToJoin.php

The process can take up to two weeks. However, there are sample data sets provided that can be used to get started. The price is 300 EUR excl. VAT, payable in advance. This is annual fee plus one pre-paid credit block (e.g 100 days of one stock (e.g Amazon) level 10 (4 times more days for level 1 data).

 

Alternatively:

 


 

Genetic Algorithms and Evolutionary strategies

 

A lot of trading strategies are out there. For most of them is claimed that they're successful, though still doesn't exist an ultimate strategy which everybody use. (There are theories which suggest that such an algorithm cannot exist...)

In order to make use of all the strategies, we will use genetic algorithms and evolutionary strategies to find the most efficient algorithm, which is perhaps a mixture of a lot of other algorithms. In order to derive our algorithm we'll have to:

 

1. Collect as many exact trading algorithms, strategies as we can

2. Implement them (or some of them) as benchmarks, so we can compare our derived algorithm to them

3. Find a way to extract the features of them. It is essential to be able to discretize the features, as every of them will represent a chromosome in our evolution.

4. Build a system, in which we can run the evolution (there is already a scala based Genetic Algorithm toolkit jiva-ng of which we can make use of). Also, we'll have to make sure that the feature set is easily modifiable, so in time we can add new features derived from possibly new exact trading strategies, or new possibilities due to new data.

4.a. the availability of the data will determine what kind of strategies we can make use of.

If the time span is short, we cannot evaluate the long term strategies, as in evolution we have to calculate the fitness of the population, and we'll have to have enough generations in order to find out a good strategy.

If we can obtain just daily prices, we cannot implement as many algorithms as we could if we had an order book data.

4.b. there are strategies which use for example sentiment analysis from news, tweets, or make conclusions by the volume of the searches on some keywords on google. Even though we are not focusing on this part, we will have a common interface with other teams, who are working on this. In that way we'll be able to make use of their progress.

5. Come up with new ideas for improving the strategies. For example, find out how to evaluate a fitness of long term strategy compared to a short term one, or to make some features more relevant if it is proven that their contribution is higher to success than other.

6. As always in this kind of projects, we cannot predict how would our investment change the market structure and the movement of prices. But this is not the scope of our project.

 

Risks

  1. Financial data is available, but often very expensive (exception is LOBSTER for academic research)

  2. There are some doubts about the usage of machine learning techniques in analysis of financial markets (http://www-stat.wharton.upenn.edu/~steele/Courses/9xx/Resources/MLFinancialApplications/MLFinance.html)

  3. Trading strategies based on order book data do not take into consideration transaction established on dark pools. These are mainly big transactions that would have a great impact on the market (http://www.dummies.com/how-to/content/investigate-the-order-book-for-uptotheminute-stock.html)

Possible extensions

Data visualisation of gathered results.  Integration of sentiment news analysis into our system. Application of trading strategies on real time data.

Team & skills

Currently we are 6 people on this project

 

The skills required for this project are Data Mining, Machine Learning, some statistics, Java, Scala, Hadoop/Spark.

At the beginning all of us will focus on collecting data and looking for different trading strategies that can be implemented. Once we do this, we are going to split up and everyone will have their own assignment, some will work on the machine learning and others will focus on building the System Engine.

Infrastructure

At this time we cannot give detailed resource requirement plan. However we are sure, that we are not going to need real time heavy parallel processing. We will create our algorithms and do the data mining in parallel way, which can be executed few times on a cluster. Of course we will also use our local machines so we can do testing on smaller datasets.

 

Knowledge base