Time Series Analysis and Its Applications: With R Examples

Second Edition

Some Examples

Here are examples of how to use lag.plot1, lag.plot2, acf2, sarima and sarima.for. Get it all in one file: itall.R. Just download the file to a convenient location and then "source" it.

SOURCING CODE: Sourcing code is easy. The basic idea is you can have any number of R commands in an external file and you can execute them at any time by reading the file into R. This can be done by copying-and-pasting the commands into R, or by sourcing the file, say ‘itall.R’, by issuing the command source("itall.R"). For Windows, Source is available on the File menu. Here's a little movie that shows how to source code: start the show

It is possible to source this code without downloading the file. For example, either
   source(url("http://www.stat.pitt.edu/stoffer/tsa2/Rcode/itall.R"))   or
   source(url("http://lib.stat.cmu.edu/general/tsa2/Rcode/itall.R"))
will work. If you go this route, you should download itall.R anyway so you have it locally should you need it (it's a small file, about 4kb).

Finally, save your workspace before closing R... that way, all these goodies will be available for the next session... and all the other stuff you did will be there, too. Just say, YES.

First an example using lag.plot2. The full call is lag.plot2(x1,x2,m,corr=T,smooth=T) and it will generate a grid of scatterplots of x1(t-h) versus x2(t) for h = 0,1,...,m, along with the cross-correlation values in blue and a lowess fit in red. Note that the first series, x1, is the one that gets lagged.

If you issue the command lag.plot2(x1,x2,m) you will see the cross-correlations but not the smoothed (red) fit. That is, corr=T and smooth=F is the default. I did it this way because the smooth is informative but is a little annoying. If you just want the scatterplots and nothing else, then lag.plot2(x1,x2,m,corr=F) is the command.

In the following example, x1=SOI and x2=Recruitment and I'm turning on the smooth, the correlations are already on.

soi=scan("/mydata/soi.dat")
rec=scan("/mydata/recruit.dat")
lag.plot2(soi,rec,8,smooth=T)
which results in the graph:
lag.plot2
The red lines are very informative in this example. Basically, you see a nonlinear relationship where Recruitment decreases linearly as SOI increases, but only when SOI is positive.

Before we move on, a little note ... because you are sourcing the code, you are free to change it to suit your needs and preferences. For example, the first line of lag.plot2 is
  lag.plot2=function(data1,data2,max.lag=0,corr=TRUE,smooth=FALSE){
If, for example, you want to have the smooth on as a default, change smooth=FALSE to smooth=TRUE.


For a similar plot with one time series, use lag.plot1. The full call is lag.plot1(x,m,corr=T,smooth=T) and it will generate a grid of scatterplots of x(t-h) versus x(t) for h = 1,...,m, along with the autocorrelation values in blue and a lowess fit in red. The defaults are the same as in the previous script. If you don't want any correlations or lines, simply use R's lag.plot. Here's an example assuming soi is still available.

lag.plot1(soi,12,smooth=T)
which results in the graph:
lag.plot2
Note that lag.plot1(soi,12) will give you the graphic without the smoothed (red) lines. The smoothes are also informative here because they suggest that the assumption of linearity is reasonable.

Now, here's an example using acf2 on the Recruitment series (rec):

acf2(rec)    # computes to lag sqrt(n)+10

        ACF  PACF
 [1,]  0.92  0.92
 [2,]  0.78 -0.44
 [3,]  0.63 -0.05
  .      .     .            
  .      .     .            
[31,] -0.13  0.06
[32,] -0.11  0.05

you get the values as above ... and a picture:

acf2(rec)

You can specify the number of lags... 48 for example:

acf2(rec,48) 
 
        ACF  PACF
 [1,]  0.92  0.92
 [2,]  0.78 -0.44
 [3,]  0.63 -0.05
  .      .     .            
  .      .     .            
[46,]  0.12  0.05
[47,]  0.17  0.08
[48,]  0.20 -0.04
and the picture:

acf2(rec,48)


Here are some examples using sarima.R and sarima.for.R. First, suppose I generate the random walk with drift, x, mentioned in R Issue 2:
set.seed(1)           # so you can reproduce the results
v = rnorm(100,1,1)    # v contains 100 iid N(1,1) variates
x = cumsum(v)         # x is a random walk with drift = 1 

and I try to fit an ARIMA(1,1,0) model:

sarima(x,1,1,0)       

initial  value -0.105745   <-- sarima shows the output from the optimization
iter   2 value -0.105750   <-- this is the Conditional Sum Sq (CSS) part                            
iter   3 value -0.105750   (*for details, see the note at the bottom of the page)
iter   4 value -0.105750   
iter   5 value -0.105750   
iter   6 value -0.105751   
iter   7 value -0.105751 
iter   7 value -0.105751 
iter   7 value -0.105751 
final  value -0.105751   
converged  
initial  value -0.110799   <-- this is the Max Likelihood (ML) part 
iter   1 value -0.110799           
final  value -0.110799 
converged
              
$fit    
Call:
arima(x = data, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = constant, include.mean = F, optim.control = list(trace = 1, REPORT = 1, reltol=tol))
                          
Coefficients:          
          ar1  constant  <-- notice this says constant (see Issue 1)
      -0.0031    1.1169  <-- and a constant is fit even when
s.e.   0.1002    0.0897       differencing (see Issue 2) 
                       
sigma^2 estimated as 0.8012:  log likelihood = -129.51,  aic = 265.01

$AIC
[1] 0.8184023  <-- this is AIC as defined in Chapter 2
                  (type ?AIC to see how R computes aic - the two are comparable)
$AICc
[1] 0.8409023  <-- this is AICc as defined in Chapter 2

$BIC
[1] -0.1294943  <-- this is SIC or BIC as defined in Chapter 2

Note that the command will also generate diagnostics as a graphic, like this:

diagnostic plot

The fit is like
∇x(t) = 1.1169 -.0031*∇x(t-1) + w(t) with the estimate of σ2 being .8012
and the AR term is not significant, as expected.

If you want to play with the innovations (i.e., the residuals) from the fit, they're stored in innov:

innov

Time Series:
Start = 1 
End = 100 
Frequency = 1 
  [1] -0.0007433653  0.0667311688 -0.9523344897  1.4754290746  0.2171587300
  [6] -0.9367240697  0.3676241150  0.6225564230  0.4607874263 -0.4208839814
 [11]  1.3935659584  0.2772364838 -0.7373100486 -2.3338899339  1.0008228665
 [16] -0.1587343246 -0.1336016133  0.8265135730  0.7068615543  0.4791632451
 [21]  0.8035377023  0.6676999363 -0.0402937232 -2.1063942292  0.4964132042
 [26] -0.1714883965 -0.2732414093 -1.5885058950 -0.5999620095  0.2991931438
 [31]  1.2426968053 -0.2158670082  0.2700817021 -0.1698812111 -1.4944982876
 [36] -0.5365173470 -0.5128434489 -0.1778029899  0.9825696404  0.6492981652
 [41] -0.2794408221 -0.3711421293  0.5789087390  0.4415416170 -0.8043102945
 [46] -0.8268935803  0.2451257019  0.6523854908 -0.2272469038  0.7634883160
 [51]  0.2835526362 -0.7280703597  0.2219580831 -1.2455829430  1.3122654009
 [56]  1.8675500712 -0.4783818983 -1.1625406761  0.4492243732 -0.2505689069
 [61]  2.2839282635 -0.1491003904  0.5723455848 -0.0871417189 -0.8604594524
 [66]  0.0692256149 -1.9216486145  1.3427114231  0.0405039141  2.0558119984
 [71]  0.3649423437 -0.8257515011  0.4912625277 -1.0494853727 -1.3737890396
 [76]  0.1703043906 -0.5596650308 -0.1175354796 -0.0429279328 -0.7065641699
 [81] -0.6877608810 -0.2542064057  1.0603971169 -1.6372034567  0.4719712284
 [86]  0.2175108404  0.9468547768 -0.4181754728  0.2518073173  0.1509681372
 [91] -0.6589683338  1.0889207200  1.0468578741  0.5865224127  1.4717219044
 [96]  0.4461113121 -1.3921411813 -0.6944781185 -1.3436545925 -0.5944529125

Another way to get the innovations is to give the analysis a name, e.g., dog=sarima(x,1,1,0), and then you can pull out the innovations as dog$fit$resid or resid(dog$fit) or even resid(sarima(x,1,1,0)$fit) if you want to leave out the dog.

Once I'm satisfied with a particular model, I can forecast. For example, with the model above and n.ahead=5 I get (this will also produce a plot of data, the forecasts and corresponding error bounds):
sarima.for(x,5,1,1,0)

$pred
Time Series:
Start = 101 
End = 105 
Frequency = 1 
[1] 112.0075 113.1244 114.2413 115.3582 116.4751

$se
Time Series:
Start = 101 
End = 105 
Frequency = 1 
[1] 0.8951188 1.2639371 1.5472077 1.7861037 1.9966174

... and the graphic:

forecast graphic


Next, let's fit an AR(2) to the Recruitment data (minus all of the graphics):
sarima(rec,2,0,0)

initial  value 3.332380 
iter   2 value 3.251366
iter   3 value 2.564654
iter   4 value 2.430141
iter   5 value 2.258212
iter   6 value 2.253343
iter   7 value 2.248346
iter   8 value 2.248345
iter   9 value 2.248345
iter  10 value 2.248341
iter  11 value 2.248332
iter  12 value 2.248331
iter  13 value 2.248330
iter  13 value 2.248330
iter  13 value 2.248330   CSS converged in 13 (or 14 or 15 ???) steps
final  value 2.248330 
converged
initial  value 2.248862 
iter   2 value 2.248857
iter   3 value 2.248855
iter   4 value 2.248855
iter   5 value 2.248854
iter   6 value 2.248854
iter   7 value 2.248854
iter   8 value 2.248854
iter   9 value 2.248853
iter  10 value 2.248853
iter  10 value 2.248853
iter  10 value 2.248853   and from there, ML converged in 10-12 steps
final  value 2.248853 
converged
$fit

Call:
arima(x = data, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = xmean, include.mean = F, optim.control = list(trace = 1, REPORT = 1, reltol=tol))

Coefficients:
         ar1      ar2    xmean
      1.3512  -0.4612  61.8585
s.e.  0.0416   0.0417   4.0039

sigma^2 estimated as 89.33:  log likelihood = -1661.51,  aic = 3331.02

$AIC
[1] 5.505631

$AICc
[1] 5.510243

$BIC
[1] 4.532889

The fitted model is like
x(t) = 61.8585*(1-1.3512-(-.4612)) + 1.3512*x(t-1) - .4612*x(t-2) + w(t)
Sorry, I didn't feel like trying to get sarima to report the constant here. At least the wording is correct (see Issue 1).

Now forecast 5 ahead:

sarima.for(rec,5,2,0,0)

$pred
Time Series:
Start = 454 
End = 458 
Frequency = 1 
[1] 20.36547 26.08036 32.65148 38.89474 44.30006

$se
Time Series:
Start = 454 
End = 458 
Frequency = 1 
[1]  9.451686 15.888378 20.464325 23.492457 25.393693
Because there are more than 100 observations, for your viewing pleasure, the graphic will show only the final 100 observations and the forecasts (and error bounds):

rec.fore


Here's a seasonal model fit on the recruitment data (this is just an example, it's not a very good model).
sarima(rec,2,0,0,1,0,0,12)     <-- it's an ARIMA(2,0,0)×(1,0,0)12

initial  value 3.342720 
iter   2 value 3.261763
iter   3 value 2.566046
iter   4 value 2.423603
iter   5 value 2.277371
iter   6 value 2.258071
iter   7 value 2.251079
iter   8 value 2.250614
iter   9 value 2.250393
iter  10 value 2.250393
iter  10 value 2.250393
final  value 2.250393 
converged
initial  value 2.240806 
iter   2 value 2.240796
iter   3 value 2.240793
iter   4 value 2.240792
iter   5 value 2.240791
iter   6 value 2.240789
iter   7 value 2.240787
iter   8 value 2.240783
iter   9 value 2.240782
iter  10 value 2.240782
iter  10 value 2.240782
final  value 2.240782 
converged
$fit

Call:
arima(x = data, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = xmean, include.mean = F, optim.control = list(trace = 1, REPORT = 1, reltol=tol))

Coefficients:
         ar1      ar2    sar1    xmean
      1.3466  -0.4514  0.1272  61.8056
s.e.  0.0419   0.0421  0.0468   4.7502

sigma^2 estimated as 87.86:  log likelihood = -1657.85,  aic = 3325.71

$AIC
[1] 5.493421

$AICc
[1] 5.498132

$BIC
[1] 4.529764

In sarima.R, if there's no differencing (d=0 and D=0) you get the mean estimate. If there's differencing of order one (either d=1 or D=1, but not both), you'll get the constant term. In any other situation, no constant or mean term is included in the model. The idea is that if you difference more than once (d + D > 1), any drift would have been removed.

◊ Note: The relative tolerance (reltol) used to assess convergence in sarima.R and sarima.for.R is set to sqrt(.Machine$double.eps), the R default. For details, see the help file for optim under the control arguments. You can change this without changing the code. For example, sarima(rec,2,0,0,tol=.0001) will do the second example with reltol = .0001. If there are many parameters to estimate (e.g., seasonal models), the analysis may take a long time using the default. That's not a problem if you're sitting at your desk, but if you're doing a demonstration you may need a good joke to tell while you're waiting for convergence. The one I thought of on the spot was: This is why cigarettes were invented. They last about as long as it takes to get results. The joke went over pretty well- you can use it free of charge.

If you want to get rid of this part, go into the code and simply change the three occurences of trace = 1 that appear in the optim.control=list() statements to trace = 0. I included it because I wanted students to realize they're doing nonlinear optimization; for many, it's the first time they see a Newton method (the wikipedia description is pretty good).