Skip to content

AMRA Model and Kalman Filter (01)

Autocorrelation

We often need to characterize time series autocorrelation, \rho_n, which is defined as

\rho_n = \frac{\text{Cov}(r_k, r_{k-n})}{\sqrt{\text{Var}(r_k) \text{Var}(r_{k-n})}}

When the sequence is weakly stationary, i.e.
\text{E}[r_k] = \mu
\text{Var}(r_k) = \sigma^2
\text{Cov}(r_k, r_{k-n}) = \gamma_n
where \mu, \sigma, and \gamma_n are constant, we write
\rho_n = \frac{\text{Cov}(r_k, r_{k-n})}{\sigma^2} = \frac{\gamma_n}{\sigma^2}.

Now, let’s work on more technical stuffs. If r_k are independent and identically distributed sequence, then the sample autocorrelation
\hat{\rho_n} = \frac{\sum_{m=k+1}^n (r_m - \bar{r})(r_{m-k} - \bar{r})}{\sum_{m = 1}^n (r_m - \bar{r})}
is asympotically normal with mean zero (trivially) and variance \frac{1}{N}. For weakly stationary sequence r_k, \hat{\rho_n} is asympotitically normal with mean zero and variance
\frac{1 + 2 \sum_{m=1}^{n-1} \rho_m^2}{T}.
if we assume \rho_m = 0 for m \geq n. Therefore, a t-ratio test statistic can be derived such that
\frac{\hat{\rho_n}}{\sqrt{\frac{1 + 2 \sum_{m=1}^{n-1} \rho_m^2}{T}}}
for the test

H_0: \rho_n = 0 H_a: \rho_n \neq 0

Q Test

When testing on multiple autocorrelations being zero, Box and Pierce (1970) and Ljung and Box (1978) classic work provided test on the hypothesis

H_0: \rho_1 = \rho_2 = \cdots = \rho_n = 0 H_a: $\rho_i \neq 0$ for some 1 \leq i \leq n

The Q test defined as
Q(m) = n(n+1) \sum_{m=1}^n \frac{\hat{\rho_m^2}}{n-m},
which follows a chi-squared distribution with $n$ degree of freedom. With Python `statsmodels` module, we can easily check the test results of some financial data.

S&P 500 Data

[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas.io.data as web
import statsmodels.tsa.api as sma

from statsmodels.tsa.arima_model import ARMA
%matplotlib inline

start = ‘2006-01-01’
end = ‘2010-12-31’
df_data = web.DataReader(‘GSPC’, ‘yahoo’, start, end)
df_data.tail()
[/sourcecode]

01_sp500

S&P 500 Daily Log Returns Analysis

[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
array_adjclose = df_data[‘Adj Close’].values
array_logret_adjclose = np.log(array_adjclose)[1:] – np.log(array_adjclose)[:-1]
array_logret_adjclose = np.insert(array_logret_adjclose, 0, 0)
df_data[‘Log Return’] = array_logret_adjclose

ax = df_data[‘Log Return’].plot()
ax.set_title(‘Daily Log Return’)
[/sourcecode]

[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
fig, ax = plt.subplots(2, 1)
ax[0].plot(sma.stattools.acf(array_logret_adjclose))
ax[1].plot(sma.stattools.pacf(array_logret_adjclose))

fig, ax = plt.subplots(3, 1)
acf, confint, qstat, pvalues = sma.stattools.acf(array_logret_adjclose, alpha=0.05, qstat=True)
ax[0].plot(acf)

# Q Stats
ax[1].plot(qstat)

# P Values
ax[2].plot(pvalues)
[/sourcecode]

01_acf

What’s Next?

We will examine how to choose and calibrate an ARMA model.

Leave a Reply

%d bloggers like this: