Tick Data and Charting for NBBO

OHLC from Tick Data
This post shows you how to process tick data and generate Open-High-Low-Close (OHLC) price statistics. Tick data consist of quotes and trades. The former provides current market bid and ask quotes, and the latter indicates latest transactions. The timestamps on tick data typically show in high granularity. It allows us to generate desirable OHLC time series even in a fraction of seconds.
We use pandas
for data processing and maplotlib
for plotting. In our first example, tick data are downloaded from Kyper database. Kyper provides both equity and futures tick data.
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
import datetime as dt
import matplotlib.dates as dates
import matplotlib.pyplot as plt
import numpy as np
import pandas as pdfrom kyper.data.kyper import futures
from kyper.data.kyper import equities
from kyper.util.time_helper import eastern_time
from matplotlib.finance import candlestick_ohlc%matplotlib inline
[/sourcecode]
Either on Kyper analytics platform or your local machine, you can get futures tick data via the function:
get_tick_data(symbol, start_dt, end_dt, max_records, session_filter)
The return format is in `pandas DataFrame` that is easy to manipulate. We use the session code `F` to capture the bid prices from the quotes. The example below retrieves Crude Oil futures “CL” tick prices and sizes.
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
df_cl = futures.get_tick_data(‘CL’,
eastern_time(2015, 1, 2, 9, 30),
eastern_time(2015, 1, 2, 9, 35),
session_filter=’F’)
df_cl.head()
[/sourcecode]
symbol | tradingDay | sessionCode | tickPrice | tickSize | |
---|---|---|---|---|---|
2015-01-02 14:30:01+00:00 | CL | 2015-01-02 | F | 69.06 | 200 |
2015-01-02 14:30:01+00:00 | CL | 2015-01-02 | F | 69.06 | 130 |
2015-01-02 14:30:01+00:00 | CL | 2015-01-02 | F | 69.06 | 200 |
2015-01-02 14:30:01+00:00 | CL | 2015-01-02 | F | 69.05 | 100 |
2015-01-02 14:30:01+00:00 | CL | 2015-01-02 | F | 69.05 | 802 |
By default, the time series data is on UTC
timezone. We would like to work with the data in our local timezone. The earlier approach is to use function from pandas
tz_convert
The timezone setting is crucial when linking to a particular local exchange.
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
df_cl.index.tz_convert(‘US/Eastern’)
[/sourcecode]
For numerical computing, the values
attribute from the DataFrame
stores the data in numpy array
format. A simple usage for generating a tick-price array vector is shown below:
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
df_cl[‘tickPrice’].values
[/sourcecode]
From ticks to OHLC price series, it is called downsampling. The high-frequency ticks are transformed into lower frequency price sequences. The function
resample()
from pandas
can help us aggregate tick information. We can explicitly use the ‘ohlc’ option in the function. Here, we use ‘T’ to derive minute OHLC price time series.
The second part of the code is to plot the output. Candlestick chart is the most common OHLC visualization. Python module matplotlib
has a finance submodule, but it can only process ordinal datetime sequence, which no surprise used in MATLAB program. When working with minute or higher frequency OHLC data, we need to adjust the x-axis tick labels manually, though it is not hard at all.
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
# Create a Figure
fig, ax = plt.subplots()# Resample Tick Price to Minute OHLC
df_ohlc = df_cl[‘tickPrice’].resample(‘T’, how=’ohlc’)# Assign Temporary Integers as the OHLC Timestamps
df_ohlc[‘DateTime’] = np.arange(len(df_ohlc))# Move the DateTime Column to the Front
cols = df_ohlc.columns
cols = cols[-1:] | cols[:-1]
df_ohlc = df_ohlc[cols]# Get OHLC numpy array and Transform it into a List of Tuples for matplotlib
array_ohlc = df_ohlc.values
list_ohlc = [tuple(x) for x in array_ohlc]# Generate the x Axis Labels
list_labels = [dt.datetime.strftime(x, ‘%H:%M’) for x in df_ohlc.index.tolist()]# Use the candlestick_ohlc from matplotlib.finance
candlestick_ohlc(ax, list_ohlc)
ax.xaxis.set_ticklabels([”] + list_labels)
ax.grid(b=True, which=’major’)
plt.title(‘Crude Oil Futures (CL) Miniute OHLC’)
[/sourcecode]

Bid-Ask and NBBO
When the ticks have both bid and ask prices, we can proceed to generate National Best Bid and Offer (NBBO) through a few more pandas
DataFrame manipulations. It is also called the Level-I Limit Order Book data since NBBO represents the price spreads and the highest bid and lowest offer at any given time on the book.
We use VXX ETF tick quote data for this example. A small data set is saved in binary pickle for demonstration.
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
df_input = pd.read_pickle(‘vxx.pickle’)
df_input[:10]
[/sourcecode]
Something we didn’t mention before is that downsampling has a minor technicality. When a tick price comes in precisely on 10:00 am. Does this tick belong to 9:59-10:00 minute interval or 10:00-10:01 minute interval? By default, the resample
function uses a left-closde right-open interval. It means that the case of the tick at 10:00 am would be included in 10:00-10:01 minute interval.
This setting might not be suitable for NBBO analysis. If we observe a tick at 10:00 am, logically it should be included in 10:00 am NBBO calculation. So, we simply modify the resample operation by swap the open and closed ends via closed
parameter:
closed='right'
Another minor issue after the change is to adjust the output timestamp label to prevent forward-looking (or ambiguity). Simply use
label='right'
Then, NBBO based on close prices can be presented.
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
# Create Second OHLC Bid and Ask Prices
df_input = df_input.resample(‘S’, closed=’right’, label=’right’, how={‘bid price’: ‘ohlc’, ‘ask price’: ‘ohlc’})# Forward Fill Empty Entries
df_input = df_input.ffill()# Compute NBBO
df_input[‘NBBO’] = df_input[‘ask price’][‘close’] – df_input[‘bid price’][‘close’]
df_input[:10]
[/sourcecode]
What’s Next?
Now you probably get an idea how to process your tick prices for OHLC price time series. Through Python pandas
module, the task seems fairly straightforward. Even with a few twists in plotting candlesticks, Python still might be the most effective tool for the purpose. Then, with both bid and ask ticks, we can reconstruct NBBO for your Level-I limit order book data. If you plan to implement a price-spread model, here is where you start.
Data processing is not as boring as you thought, but it certainly not sexy. We shall spice it up next time with fancier data visualization. See you next time.