# AMRA Model and Kalman Filter (01)

### Autocorrelation

We often need to characterize time series autocorrelation, $\rho_n$, which is defined as

$\rho_n = \frac{\text{Cov}(r_k, r_{k-n})}{\sqrt{\text{Var}(r_k) \text{Var}(r_{k-n})}}$

When the sequence is weakly stationary, i.e.
$\text{E}[r_k] = \mu$
$\text{Var}(r_k) = \sigma^2$
$\text{Cov}(r_k, r_{k-n}) = \gamma_n$
where $\mu$, $\sigma$, and $\gamma_n$ are constant, we write
$\rho_n = \frac{\text{Cov}(r_k, r_{k-n})}{\sigma^2} = \frac{\gamma_n}{\sigma^2}.$

Now, let’s work on more technical stuffs. If $r_k$ are independent and identically distributed sequence, then the sample autocorrelation
$\hat{\rho_n} = \frac{\sum_{m=k+1}^n (r_m - \bar{r})(r_{m-k} - \bar{r})}{\sum_{m = 1}^n (r_m - \bar{r})}$
is asympotically normal with mean zero (trivially) and variance $\frac{1}{N}$. For weakly stationary sequence $r_k$, $\hat{\rho_n}$ is asympotitically normal with mean zero and variance
$\frac{1 + 2 \sum_{m=1}^{n-1} \rho_m^2}{T}.$
if we assume $\rho_m = 0$ for $m \geq n$. Therefore, a t-ratio test statistic can be derived such that
$\frac{\hat{\rho_n}}{\sqrt{\frac{1 + 2 \sum_{m=1}^{n-1} \rho_m^2}{T}}}$
for the test

$H_0$: $\rho_n = 0$ $H_a$: $\rho_n \neq 0$

### Q Test

When testing on multiple autocorrelations being zero, Box and Pierce (1970) and Ljung and Box (1978) classic work provided test on the hypothesis

$H_0$: $\rho_1 = \rho_2 = \cdots = \rho_n = 0$ $H_a$: $\rho_i \neq 0$ for some $1 \leq i \leq n$

The Q test defined as
$Q(m) = n(n+1) \sum_{m=1}^n \frac{\hat{\rho_m^2}}{n-m},$
which follows a chi-squared distribution with $n$ degree of freedom. With Python statsmodels module, we can easily check the test results of some financial data.

### S&P 500 Data

[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas.io.data as web
import statsmodels.tsa.api as sma

from statsmodels.tsa.arima_model import ARMA
%matplotlib inline

start = ‘2006-01-01’
end = ‘2010-12-31’
df_data = web.DataReader(‘GSPC’, ‘yahoo’, start, end)
df_data.tail()
[/sourcecode]

S&P 500 Daily Log Returns Analysis

[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]

ax = df_data[‘Log Return’].plot()
ax.set_title(‘Daily Log Return’)
[/sourcecode]

[sourcecode language=”python” light=”true” wraplines=”false” collapse=”false”]
fig, ax = plt.subplots(2, 1)

fig, ax = plt.subplots(3, 1)
acf, confint, qstat, pvalues = sma.stattools.acf(array_logret_adjclose, alpha=0.05, qstat=True)
ax[0].plot(acf)

# Q Stats
ax[1].plot(qstat)

# P Values
ax[2].plot(pvalues)
[/sourcecode]

### What’s Next?

We will examine how to choose and calibrate an ARMA model.