Time Series Analysis and Its Applications: With R Examples

Second Edition

This is the site for the second edition of the text and is no longer maintained.

Follow this link if you're looking for the site of the third edition
.

Some Useful Scripts

  ◊ lag.plot1 example [the script] - grid of plots of x(t-h) vs. x(t)
  ◊ lag.plot2 example [the script] - grid of plots of x1(t-h) vs. x2(t)
  ◊ acf2 example [the script] - simultaneous plot of sample ACF and PACF on the same scale
  ◊ sarima example [the script] - fits (S)ARIMA models including diagnostics in a short command
  ◊ sarima.for example [the script] - gives (S)ARIMA forecasts (values and graphic) in a short command
  ◊ spec.arma example [the script] - gives ARMA spectrum, tests causality, invertibility, common zeros

You can get it all in one file: itall.R. Just download the file to a convenient location and then "source" it.

SOURCING CODE: Sourcing code is easy (and don't let the word source scare you ... just substitute the word read for source and it will lose its mystery). The basic idea is you can have any number of R commands in an external file and you can execute them at any time by reading the file into R. This can be done by copying-and-pasting the commands into R, or by sourcing the file, in our case, itall.R, by issuing the command source("itall.R"). For Windows, Source is available on the File menu. Here's a little movie that shows how to source code: start the show

It is possible to source (read) this code without downloading the file. For example, either
   source(url("http://www.stat.pitt.edu/stoffer/tsa2/Rcode/itall.R"))   or
   source(url("http://lib.stat.cmu.edu/general/tsa2/Rcode/itall.R"))
will work. If you go this route, you should download itall.R anyway so you have it locally should you need it (it's a small file, about 6kb).

Finally, save your workspace before closing R... that way, all these goodies will be available for the next session... and all the other stuff you did will be there, too. Just say, YES.



lag.plot2

First an example using lag.plot2. The full call is lag.plot2(x1,x2,m,corr=TRUE,smooth=TRUE) and it will generate a grid of scatterplots of x1(t-h) versus x2(t) for h = 0,1,...,m, along with the cross-correlation values in blue and a lowess fit in red. Note that the first series, x1, is the one that gets lagged.

If you issue the command lag.plot2(x1,x2,m) you will see the cross-correlations but not the smoothed (red) fit. That is, corr=TRUE and smooth=FALSE is the default. I did it this way because the smooth can be informative but is a little annoying. If you just want the scatterplots and nothing else, then lag.plot2(x1,x2,m,corr=FALSE) is the command.

In the following example, x1=SOI and x2=Recruitment and I'm turning on the smooth, the correlations are already on.

soi = scan("/mydata/soi.dat")
rec = scan("/mydata/recruit.dat")
lag.plot2(soi,rec,8,smooth=TRUE)
which results in the graph:
lag.plot2
The red lines are very informative in this example so it's a good thing we turned on the smooth.

Before we move on, a little note ... because you are sourcing the code, you are free to change it to suit your needs and preferences. For example, the first line of lag.plot2 is
  lag.plot2 = function(data1, data2, max.lag=0, corr=TRUE, smooth=FALSE){
If, for example, you want to have the smooth on as a default, change smooth=FALSE to smooth=TRUE.

lag.plot1

For a similar plot with one time series, use lag.plot1. The full call is lag.plot1(x,m,corr=TRUE,smooth=TRUE) and it will generate a grid of scatterplots of x(t-h) versus x(t) for h = 1,...,m, along with the autocorrelation values in blue and a lowess fit in red. The defaults are the same as in the previous script. If you don't want any correlations or lines, simply use R's lag.plot. Here's an example assuming soi is still available.

lag.plot1(soi,12,smooth=TRUE)
which results in the graph:
lag.plot2
Note that lag.plot1(soi,12) will give you the graphic without the smoothed (red) lines.

acf2

Now, here's an example using acf2 on the Recruitment series (rec):

acf2(rec)    # computes to lag sqrt(n)+10 

        ACF  PACF
 [1,]  0.92  0.92
 [2,]  0.78 -0.44
 [3,]  0.63 -0.05
  .      .     .            
  .      .     .            
[31,] -0.13  0.06
[32,] -0.11  0.05
you get the values as above ... and a picture:
acf2(rec)

You can specify the number of lags... 48 for example:

acf2(rec,48) 
 
        ACF  PACF
 [1,]  0.92  0.92
 [2,]  0.78 -0.44
 [3,]  0.63 -0.05
  .      .     .            
  .      .     .            
[46,]  0.12  0.05
[47,]  0.17  0.08
[48,]  0.20 -0.04
and the picture:
acf2(rec,48)

sarima & sarima.for

The basics are as follows: If your time series is in x and you want to fit an ARIMA(p,d,q) model to your data, the call is sarima(x,p,d,q). The results are the parameter estimates, standard errors, AIC, AICc, BIC (as defined in Chapter 2) and diagnostics. To fit a seasonal ARIMA model, the call is sarima(x,p,d,q,P,D,Q,s). So, for example, sarima(x,2,1,0) will fit an ARIMA(2,1,0) model to the series in x, and sarima(x,2,1,0,0,1,1,12) will fit a seasonal ARIMA(2,1,0)×(0,1,1)12 model to the series in x.

Here are some examples using sarima and sarima.for. First, suppose I generate the random walk with drift, x, mentioned in R Issue 2:

set.seed(1)           # so you can reproduce the results
v = rnorm(100,1,1)    # v contains 100 iid N(1,1) variates
x = cumsum(v)         # x is a random walk with drift = 1 

and I try to fit an ARIMA(1,1,0) model:

sarima(x,1,1,0)       # <-- this is the most basic form ... easy, huh?

initial  value -0.105745   <-- sarima shows the output from the optimization
iter   2 value -0.105750   <-- this is the Conditional Sum Sq (CSS) part                            
iter   3 value -0.105750   ** you can shut this off using sarima(x,1,1,0,details=FALSE) 
iter   4 value -0.105750    ... additional notes are at the end of these examples 
iter   5 value -0.105750   
iter   6 value -0.105751   
iter   7 value -0.105751 
iter   7 value -0.105751 
iter   7 value -0.105751 
final  value -0.105751   
converged  
initial  value -0.110799   <-- this is the Max Likelihood (ML) part 
iter   1 value -0.110799           
final  value -0.110799 
converged
              
$fit    
Call:
arima(x = data, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = constant, include.mean = F, optim.control = list(trace = 1, REPORT = 1, reltol=tol))
                          
Coefficients:          
          ar1  constant  <-- notice this says constant (see Issue 1)
      -0.0031    1.1169  <-- and a constant is fit even when
s.e.   0.1002    0.0897       differencing (see Issue 2) 
                       
sigma^2 estimated as 0.8012:  log likelihood = -129.51,  aic = 265.01

$AIC
[1] 0.8184023  <-- this is AIC as defined in Chapter 2
                  (type ?AIC to see how R computes aic - the two are comparable)
$AICc
[1] 0.8409023  <-- this is AICc as defined in Chapter 2

$BIC
[1] -0.1294943  <-- this is SIC or BIC as defined in Chapter 2

The fit is like
∇x(t) = 1.1169 -.0031*∇x(t-1) + w(t) with the estimate of σ2 being .8012
and the AR term is not significant, as expected.

Note that the command will also generate diagnostics as a graphic, like this:

diagnostic plot
This graphic is an improved version of R's tsdiag() script. In addition to adding a normal Q-Q plot of the standardized residuals, it uses the correct degrees of freedom for calculating the p-values (see Issue 4 for more details).

If you want to play with the innovations (i.e., the residuals) from the fit, they're stored in innov:
innov

Time Series:
Start = 1 
End = 100 
Frequency = 1 
  [1] -0.0007433653  0.0667311688 -0.9523344897  1.4754290746  0.2171587300
  [6] -0.9367240697  0.3676241150  0.6225564230  0.4607874263 -0.4208839814
 [11]  1.3935659584  0.2772364838 -0.7373100486 -2.3338899339  1.0008228665
 [16] -0.1587343246 -0.1336016133  0.8265135730  0.7068615543  0.4791632451
 [21]  0.8035377023  0.6676999363 -0.0402937232 -2.1063942292  0.4964132042
 [26] -0.1714883965 -0.2732414093 -1.5885058950 -0.5999620095  0.2991931438
 [31]  1.2426968053 -0.2158670082  0.2700817021 -0.1698812111 -1.4944982876
 [36] -0.5365173470 -0.5128434489 -0.1778029899  0.9825696404  0.6492981652
 [41] -0.2794408221 -0.3711421293  0.5789087390  0.4415416170 -0.8043102945
 [46] -0.8268935803  0.2451257019  0.6523854908 -0.2272469038  0.7634883160
 [51]  0.2835526362 -0.7280703597  0.2219580831 -1.2455829430  1.3122654009
 [56]  1.8675500712 -0.4783818983 -1.1625406761  0.4492243732 -0.2505689069
 [61]  2.2839282635 -0.1491003904  0.5723455848 -0.0871417189 -0.8604594524
 [66]  0.0692256149 -1.9216486145  1.3427114231  0.0405039141  2.0558119984
 [71]  0.3649423437 -0.8257515011  0.4912625277 -1.0494853727 -1.3737890396
 [76]  0.1703043906 -0.5596650308 -0.1175354796 -0.0429279328 -0.7065641699
 [81] -0.6877608810 -0.2542064057  1.0603971169 -1.6372034567  0.4719712284
 [86]  0.2175108404  0.9468547768 -0.4181754728  0.2518073173  0.1509681372
 [91] -0.6589683338  1.0889207200  1.0468578741  0.5865224127  1.4717219044
 [96]  0.4461113121 -1.3921411813 -0.6944781185 -1.3436545925 -0.5944529125

Another way to get the innovations is to give the analysis a name, e.g., dog=sarima(x,1,1,0), and then you can pull out the innovations as dog$fit$resid or resid(dog$fit) or even resid(sarima(x,1,1,0)$fit) if you want to leave out the dog.

Once I'm satisfied with a particular model, I can forecast. For example, with the model above and n.ahead=5 I get (this will also produce a plot of data, the forecasts and corresponding error bounds):
sarima.for(x,5,1,1,0)

$pred
Time Series:
Start = 101 
End = 105 
Frequency = 1 
[1] 112.0075 113.1244 114.2413 115.3582 116.4751

$se
Time Series:
Start = 101 
End = 105 
Frequency = 1 
[1] 0.8951188 1.2639371 1.5472077 1.7861037 1.9966174

... and the graphic:

forecast graphic


Next, let's fit an AR(2) to the Recruitment data (minus all of the graphics):
sarima(rec,2,0,0)

initial  value 3.332380 
iter   2 value 3.251366
iter   3 value 2.564654
iter   4 value 2.430141
iter   5 value 2.258212
iter   6 value 2.253343
iter   7 value 2.248346
iter   8 value 2.248345
iter   9 value 2.248345
iter  10 value 2.248341
iter  11 value 2.248332
iter  12 value 2.248331
iter  13 value 2.248330
iter  13 value 2.248330
iter  13 value 2.248330   CSS converged in 13 (or 14 or 15 ???) steps
final  value 2.248330 
converged
initial  value 2.248862 
iter   2 value 2.248857
iter   3 value 2.248855
iter   4 value 2.248855
iter   5 value 2.248854
iter   6 value 2.248854
iter   7 value 2.248854
iter   8 value 2.248854
iter   9 value 2.248853
iter  10 value 2.248853
iter  10 value 2.248853
iter  10 value 2.248853   and from there, ML converged in 10-12 steps
final  value 2.248853 
converged
$fit

Call:
arima(x = data, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = xmean, include.mean = F, optim.control = list(trace = 1, REPORT = 1, reltol=tol))

Coefficients:
         ar1      ar2    xmean
      1.3512  -0.4612  61.8585
s.e.  0.0416   0.0417   4.0039

sigma^2 estimated as 89.33:  log likelihood = -1661.51,  aic = 3331.02

$AIC
[1] 5.505631

$AICc
[1] 5.510243

$BIC
[1] 4.532889

The fitted model is like
x(t) = 61.8585*(1-1.3512-(-.4612)) + 1.3512*x(t-1) - .4612*x(t-2) + w(t)
Sorry, I didn't feel like trying to get sarima to report the constant here. At least the wording is correct (see Issue 1). And again, if you would rather not see the output from the nonlinear optimization routine, you would use sarima(rec,2,0,0,details=FALSE).

Now forecast 5 ahead:

sarima.for(rec,5,2,0,0)

$pred
Time Series:
Start = 454 
End = 458 
Frequency = 1 
[1] 20.36547 26.08036 32.65148 38.89474 44.30006

$se
Time Series:
Start = 454 
End = 458 
Frequency = 1 
[1]  9.451686 15.888378 20.464325 23.492457 25.393693

Because there are more than 100 observations, for your viewing pleasure, the graphic will show only the final 100 observations and the forecasts (and error bounds):

rec.fore


Here's a seasonal model fit on the recruitment data, minus the diagnostic graphics and with the details off (this is just an example, it's not a very good model).
sarima(rec,2,0,0,1,0,0,12, details=FALSE)   #<-- it's an ARIMA(2,0,0)×(1,0,0)12

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = xmean, include.mean = F, optim.control = list(trace = trc, REPORT = 1, 
        reltol = tol))

Coefficients:
         ar1      ar2    sar1    xmean
      1.3466  -0.4514  0.1272  61.8056
s.e.  0.0419   0.0421  0.0468   4.7502

sigma^2 estimated as 87.86:  log likelihood = -1657.85,  aic = 3325.71

$AIC
[1] 5.493421

$AICc
[1] 5.498132

$BIC
[1] 4.529764

◊ In sarima, if there's no differencing (d=0 and D=0) you get the mean estimate. If there's differencing of order one (either d=1 or D=1, but not both), you'll get the constant term. In any other situation, no constant or mean term is included in the model. The idea is that if you difference more than once (d + D > 1), any drift would have been removed.

◊ The relative tolerance (reltol) used to assess convergence in sarima.R and sarima.for.R is set to sqrt(.Machine$double.eps), the R default. For details, see the help file for optim under the control arguments. You can change this without changing the code. For example, sarima(rec,2,0,0,tol=.0001) will do the second example with reltol = .0001. If there are many parameters to estimate (e.g., seasonal models), the analysis may take a long time using the default. That's not a problem if you're sitting at your desk, but if you're doing a demonstration you may need a good joke to tell while you're waiting for convergence. The one I thought of on the spot was: This is why cigarettes were invented. They last about as long as it takes to get results. The joke went over pretty well- you can use it free of charge.

spec.arma

And finally, here are some examples using spec.arma, which will produce the spectral density of an ARMA model. It can also be used to determine if the model is causal and/or invertible (and there's a check for common zeros). The basic call is spec.arma(ar, ma) where ar and ma are vectors containing the model parameters. For example, to see the logged spectral density of the ARMA(2,1) model x(t) = 1.4 x(t-1) - .9 x(t-2) + w(t) + .8 w(t-1), you issue the command:
spec.arma(ar=c(1.4,-.9), ma=.8)
and you would see
spec.arma picture

If you don't want the logged spectrum, use spec.arma(ar=c(1.4,-.9), ma=.8, log="no"). The script will test the roots and you'll get warnings if the roots are not outside the unit circle or if there is parameter redundancy. For example,

spec.arma(ar=1)
 WARNING: Model Not Causal
 Error in spec.arma(ar = 1) : Try Again

spec.arma(ma=1)
 WARNING: Model Not Invertible
 Error in spec.arma(ma = 1) : Try Again

spec.arma(ar=39.95, ma=c(.9,2))
 WARNING: Model Not Causal
 WARNING: Model Not Invertible
 Error in spec.arma(ar = 39.95, ma = c(0.9, 2)) : Try Again

# common zeros:
spec.arma(ar=.8, ma=-.8) # this will work but you'll see the spectral density of white noise
 WARNING: Parameter Redundancy   ... and you'll get a warning.

# approximate parameter redundancy (common zeros):
spec.arma(ar=.8, ma=-.7999) 
 WARNING: Parameter Redundancy  note that the spectral ordinates are between  
                                 0.9999 and 1.0010, so you basically have white noise 

# no arguments gives the spectral density of white noise with &sigma = 1;  
spec.arma() 

# and if you want to change the variance, do something like this:
spec.arma(ar = c(1.5,-.75), var.noise = 39.95, log="no")  
 
Finally, the frequencies and the spectral density ordinates are returned invisibly, e.g., spec.arma(ar=.9)$freq and spec.arma(ar=.9)$spec, if you're interested in the actual values.