Kalman Filter (01) – S&P 500 and Dow Jones Linear Regression


Linear Regression
Let’s get some Kalman filter basics and start playing around with it.
There is a long history about least-squares filtering. Like a linear regression model, we can fit the data with a linear function that minimizes the mean-square error. Python pandas can easily do the job:
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
import datetime as dt
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smffrom pandas.io.data import DataReader
from statsmodels.api import OLS# Get Adjusted Close Prices of SPY (S&P 500 ETF) and DIA (Dow Jones)
df_SPY = DataReader(‘SPY’, ‘yahoo’, dt.datetime(2000,1,1), dt.datetime(2015,1,1))
df_DIA = DataReader(‘DIA’, ‘yahoo’, dt.datetime(2000,1,1), dt.datetime(2015,1,1))
df_data = pd.concat([df_SPY[‘Adj Close’], df_DIA[‘Adj Close’]], axis=1)
df_data.columns = [‘SPY’, ‘DIA’]# Linear Regression with pandas
result = smf.ols(‘DIA ~ SPY’, df_data).fit()
result.summary()
[/sourcecode]
To get some visual of the regression,
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
import matplotlib.pyplot as pltfig, ax = plt.subplots(1, 1)
df_data[:252*8].plot(ax=ax, kind=’scatter’, x=’SPY’, y=’DIA’, figsize=(16,12), color=(0.5, 0.5, 0.5))
df_data[252*8:].plot(ax=ax, kind=’scatter’, x=’SPY’, y=’DIA’, figsize=(16,12), color=(0.7, 0.5, 0.5))intercept, slope = result.params
x = np.array([df_data.SPY.min(), df_data.SPY.max()])
y = intercept + slope * x
plt.plot(x, y, ‘r-‘)
[/sourcecode]
First observation you might have is: there are two clusters. The light grey indicate the cluster for the SPY-DIA dots in the years from 2000 to 2008; the light purple dots indicate the cluster for years from 2008 to 2015. So the regressed line parameters should not be constant.
R.E. Kalman in 60s published his groundbreaking paper to accommodate the time-varying linear models. Also, the framework he proposed can handle noisy measurements (prices) which might distort the fitting result. Nowaday Kalman filter becomes a popular tool for signal processing.
To start, let’s introduce the notation. Let be a sequence of the true intercepts and slopes of the SPY-DIA regression and assume
are the *prior* knowledge associated with
and
. It is what we believe the time series of
and
distribution should behave at time
following
linear transformation. So, from time
to time
we have
We assume follows a normal distribution. If you believe the parameters should be constant over time, then we simply assume
is an identity matrix and
, which is
mean,
variance. Now, if
and
represent the prices at time
, we plug
and
back to the price relationship:
where represents the measurement error of the linear regression (becasue the relatinship is never perfactly linear and subjected to many room of interpretation). We also assume
follows the normal distribution.
Now, let’s try if we can recreate the oridinary linear regression result. We will use pykalman
module.
[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
import pykalmanfrom pykalman import KalmanFilter
# Transition Matrix
A = np.array([[1, 0], [0, 1]])
# Transition Covariance
Q = np.zeros(A.shape)# Convert the Price Data to 3-Dimensional Array. Add an Additional Column for Intercept Term
array_SPY = np.expand_dims(df_data[‘SPY’].values, 1)
array_SPY = np.append(array_SPY, np.ones(array_SPY.shape), axis=1)
array_SPY = np.expand_dims(array_SPY, 1)# Create a Kalman Filter Object
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2, transition_matrices=A, transition_covariance=Q, observation_matrices=array_SPY)
kf = kf.em(df_data[‘DIA’].values)# Linear Regression Outputs
alpha = kf.initial_state_mean[1]
beta = kf.initial_state_mean[0]# Linear Regression Figure
fig, ax = plt.subplots(1, 1)
df_data[:252*8].plot(ax=ax, kind=’scatter’, x=’SPY’, y=’DIA’, figsize=(16,12), color=(0.5, 0.5, 0.5))
df_data[252*8:].plot(ax=ax, kind=’scatter’, x=’SPY’, y=’DIA’, figsize=(16,12), color=(0.7, 0.5, 0.5))
x = np.array([df_data.SPY.min(), df_data.SPY.max()])
y = alpha + beta * x
plt.plot(x, y, ‘r-‘)
[/sourcecode]
What’s Next?
In light of the increasing need for more complicated models to capture prices dynamics, intuition on advanced development is connected with simple cases. Today we set foot in Kalman filter and successfully replicate an ordinary least-squares regression result. Even with noisy price data, Kalman filter has done its first task by smoothing out the signals.
Next time we shall start some fundamental pairs trading with Kalman filter. We will consider the nature of streaming financial data feed and build a workable long-short strategy on S&P 500 and Dow Jones ETFs. We will show how Kalman filter generates divergent signals, and how additional latent factors can improve the performance.