UWA logoMaterial to support teaching in Environmental Science at The University of Western Australia

Units ENVT3361, ENVT4461, and ENVT5503

This guide is organised into 3 main sections:

  1. Setting up R to analyse time series data, and the associated data formats;
  2. Understanding the concepts, components and features of time series data;
  3. Forecasting time series using basic trend analysis and more complex ARIMA (and other) models – the type of content we would include in reports.

 

Set up the R environment for time series analysis

We will need the additional functions in several R packages for specialized time series and other functions in R...

# Load the packages we need 
library(zoo)      # for basic irregular time series functions
library(xts)      # we need the xts "eXtended Time Series" format for some functions
library(Kendall)  # for trend analysis with Mann-Kendall test
library(trend)    # for trend analysis using the Sen slope
library(forecast) # for time series forecasting with ARIMA and exponential smoothing 
library(tseries)  # for assessing stationarity using Augmented Dickey-Fuller test
library(lmtest)   # for Breusch-Pagan heteroscedasticity test etc.
library(car)      # for various commonly-used functions
library(ggplot2)  # alternative to base R plots

# (optional) make a better colour palette than the R default!
# still not ideal for colourblindness (consider using viridis package)
palette(c("black","purple4","#0041B2","darkslategray4",
          "gold3","orangered","darkred","dimgray"))
par(mfrow=c(3,1), mar=c(4,4,1,1), mgp=c(1.7,0.3,0), tcl=0.25, font.lab=2)

 

1. How R reads and stores time series data

We first read the data into a data frame – this is how R often stores data – it's not the format we need, but we'll use it for comparison.

Non-time series object for comparison

git <- "https://raw.githubusercontent.com/Ratey-AtUWA/Learn-R-web/main/"
soiltemp <- read.csv(paste0(git,"soiltemp2.csv"))
colnames(soiltemp) <- c("Date","temp")

 

Do some checks of the (non-time series) data

summary(soiltemp) # simple summary of each column
str(soiltemp) # more detailed information about the R object 
# ('str'=structure, i.e. how the data are organised in the R object)
##      Date                temp      
##  Length:1464        Min.   :10.40  
##  Class :character   1st Qu.:12.45  
##  Mode  :character   Median :14.30  
##                     Mean   :14.86  
##                     3rd Qu.:17.41  
##                     Max.   :21.70  
## 'data.frame':    1464 obs. of  2 variables:
##  $ Date: chr  "2014-04-01 00:00:00" "2014-04-01 01:00:00" "2014-04-01 02:00:00" "2014-04-01 03:00:00" ...
##  $ temp: num  20.1 19.8 19.6 19.2 19 ...

 

We can also check the general form of the data with a plot (Figure 1). From this plot and from the text summary output above, we can see that the Date column is not a date or time, or even numeric.

par(mfrow = c(1,1), mar = c(3,3,1,1), mgp=c(1.6,0.3,0), tcl=0.25, oma = c(0,0,0,0), font.lab=2)
with(soiltemp, plot(temp, type = "l", col = 7))
Figure 1: Plot of soil temperature time series data which is not (yet) formatted as a time series.

Figure 1: Plot of soil temperature time series data which is not (yet) formatted as a time series.

 

Reading and storing data as time series

We really want the data in a different type of R object! We use the read.csv.zoo() function from the zoo package to read the data in as a time series. This gives a data object of class zoo which is much more flexible than the default time series object in R.

We need to specify a format that our date/time information is stored as. The character string "%Y-%m-%d %H:%M:%S" looks unusual but it should make sense (%Y is 4-digit year, and so on). If we have opened and saved our .csv file in Excel, the format will probably change, so we should try format = "%d/%m/%Y %H:%M:%S". If in doubt, try to match the format win the actual .csv file.

soiltemp_T15_zoo <- read.csv.zoo(paste0(git,"soiltemp2.csv"),
                               format = "%Y-%m-%d %H:%M:%S", 
                               tz = "Australia/Perth", 
                               index.column=1,
                               header = TRUE)
# do some quick checks of the new R object:
summary(soiltemp_T15_zoo) ; cat("\n")
str(soiltemp_T15_zoo) # POSIXct in the output is a date-time format
##      Index                     soiltemp_T15_zoo
##  Min.   :2014-04-01 00:00:00   Min.   :10.40   
##  1st Qu.:2014-04-16 05:45:00   1st Qu.:12.45   
##  Median :2014-05-01 11:30:00   Median :14.30   
##  Mean   :2014-05-01 11:30:00   Mean   :14.86   
##  3rd Qu.:2014-05-16 17:15:00   3rd Qu.:17.41   
##  Max.   :2014-05-31 23:00:00   Max.   :21.70   
## 
## 'zoo' series from 2014-04-01 to 2014-05-31 23:00:00
##   Data: num [1:1464] 20.1 19.8 19.6 19.2 19 ...
##   Index:  POSIXct[1:1464], format: "2014-04-01 00:00:00" "2014-04-01 01:00:00" "2014-04-01 02:00:00" "2014-04-01 03:00:00" ...

We can see that the column Date has now become Index, and it's correctly stored as date/time information, in a format called POSIXct (for more information, run ?DateTimeClasses in the Console. Again, it's usually useful to check our data with a plot (Figure 2).

par(mfrow=c(1,1), mar=c(3,3,1,1), mgp=c(1.6,0.3,0), tcl=0.25, font.lab=2)
plot(soiltemp_T15_zoo, col = 3) 
Figure 2: Plot of soil temperature time series data which is formatted as a zoo time series object.

Figure 2: Plot of soil temperature time series data which is formatted as a zoo time series object.

 

Sometimes we need to use another time series data format, xts (eXtended Time Series), which allows us to use more functions...

 

Make an xts object from our zoo object

soiltemp_T15_xts <- as.xts(soiltemp_T15_zoo)
str(soiltemp_T15_xts) # just to check
## An xts object on 2014-04-01 / 2014-05-31 23:00:00 containing: 
##   Data:    double [1464, 1]
##   Index:   POSIXct,POSIXt [1464] (TZ: "Australia/Perth")

 

2. Exploratory data analysis and decomposition of time series

This section is designed to help you understand the features and components of time series, and some of the statistical methods that we use to analyse them. We also look at time series decomposition. The important issues are:

  • Do we need to transform our time series variable?
  • Is our time series stationary?
  • Can we separate out a trend in out variable vs. time?
  • Is our time series seasonal (periodic) and, if so, what is the periodic frequency?
  • Is there significant autocorrelation in our data?
  • What is left unexplained after we isolate the trend and periodicity?

Not all of these would need to be presented in a Report, but it's useful for us to understand time series concepts. This is especially true for autocorrelation and moving averages, which both rely [at least partially] on past observations to estimate current observations. From there it follows that we may be able to use past observations to predict future observations – i.e., forecasting.

We first examine a plot of the data in our time series object (the plot for the xts time series object is in Figure 3).

par(mfrow=c(1,1), mar=c(3,3,1,1), mgp=c(1.6,0.3,0), tcl=0.25, font.lab=2)
plot(soiltemp_T15_xts, col = 3, ylab = "Soil temperature (\u00B0C)") # just to check
Figure 3: Plot of soil temperature time series data which is formatted as an `xts` time series object.

Figure 3: Plot of soil temperature time series data which is formatted as an xts time series object.

 

Plotting the time series variable (with time as the independent variable!) can give us some initial clues about the overall trend in the data (in Figure 3, a negative overall slope) and if there is any periodicity (there do seem to be regular increases and decreases in soil temperature). The plot in Figure 3 also shows that plotting an xts time series object gives a somewhat more detailed plot than for a zoo formatted time series object (Figure 2).

We often want to inspect plots of both the raw data and using common transformations (Figure 4). We compare with transformed time series data – we may need to do this to meet later modelling assumptions.

First we change the default plotting parameters using par(...), then plot the object and some transformed versions (with custom axis labels).

plot(soiltemp_T15_zoo, ylab = "Temperature (\u00B0C)",
     xlab = "Date", col = 2, lwd = 2, cex.lab = 1.4)
mtext("Untransformed", side=3, line=-1.5, col=2, adj=0.98, font=2)
plot(log10(soiltemp_T15_zoo), 
     ylab = expression(bold(paste(log[10],"(Temperature, \u00B0C)"))),
     xlab = "Date", col = 3, lwd = 2, cex.lab = 1.4)
mtext("log-transformed", side=3, line=-1.5, col=3, adj=0.98, font=2)
pt0 <- powerTransform(coredata(soiltemp_T15_zoo))
if(pt0$lambda<0) {
  plot(
    -1 * (soiltemp_T15_zoo ^ pt0$roundlam), # use rounded power term
    ylab = "power-transf. (Temp., \u00B0C)",
    xlab = "Date",
    col = 7, lwd = 2, cex.lab = 1.4
  )
} else {
  plot((soiltemp_T15_zoo ^ pt0$roundlam), # use rounded power term
       ylab = "power-transf. (Temp., \u00B0C)",
       xlab = "Date",
       col = 7, lwd = 2, cex.lab = 1.4
  )
}
mtext("power-transformed", side=3, line=-1.5, col=7, adj=0.98, font=2)
Figure 4: Plots of soil temperature time series data as untransformed, log₁₀-transformed, and power transformed values.

Figure 4: Plots of soil temperature time series data as untransformed, log₁₀-transformed, and power transformed values.

 

Which of these plots looks like it might be homoscedastic, i.e., have constant variance regardless of time?

The R Cookbook (Long and Teetor, 2019) suggests Box-Cox (power) transforming the variable to "stabilize the variance" (i.e. reduce heteroscedasticity). This does work, but can be better to use log10[variable], as the values are easier to interpret.

There are tests for homoscedasticity, usually based on the residuals from fitting a model (e.g. the Breusch-Pagan test). For our purposes visual assessment is enough.

If we think a transformation is needed, then run something like the code below (which does a square root transformation, i.e. variable0.5). This is just an example – we may, for example, decide that the log10-transformation is more suitable. If we save the results of powerTransform() to an object, it's most convenient to use the rounded power term (e.g. pt0$roundlam).

soiltemp_T15_zoo <- soiltemp_T15_zoo^0.5 #### don't run this chunk of code...
# don't forget the xts version either!
soiltemp_T15_xts <- as.xts(soiltemp_T15_zoo) # ...it's just an example 😊

 

Assessing if a time series variable is stationary

A stationary variable's mean and variance are not dependent on time. In other words, for a stationary series, the mean and variance at any time are representative of the whole series.

If there is a trend for the value of the variable to increase or decrease, or if there are periodic fluctuations, we don't have a stationary time series.

Many useful statistical analyses and models for time series models need a stationary time series as input, or a time series that can be made stationary with transformations or differencing.

 

Testing for stationarity

# we need the package 'tseries' for the Augmented Dickey–Fuller (adf) Test
d0 <- adf.test(soiltemp_T15_zoo); print(d0)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  soiltemp_T15_zoo
## Dickey-Fuller = -3.9515, Lag order = 11, p-value = 0.01141
## alternative hypothesis: stationary
d1 <- adf.test(diff(soiltemp_T15_zoo,1)); print(d1)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(soiltemp_T15_zoo, 1)
## Dickey-Fuller = -16.143, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary
d2 <- adf.test(diff(diff(soiltemp_T15_zoo,1),1)); print(d2)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(diff(soiltemp_T15_zoo, 1), 1)
## Dickey-Fuller = -17.204, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary

 

Plot differencing for stationarity

The plots in Figure 5 below show that constant mean and variance (i.e. stationarity) is achieved after a single differencing step. The adf.test p-value suggests that the non-differenced series is already stationary, but the first sub-plot in Figure 5 does not seem to support this.

par(mfrow = c(3,1), mar = c(0,4,0,1), oma = c(4,0,1,0), cex.lab = 1.4,
    mgp = c(2.5,0.7,0), font.lab = 2)
plot(soiltemp_T15_zoo, ylab = "Raw data, no differencing",
     xlab="", xaxt="n", col = 8)
lines(loess.smooth(index(soiltemp_T15_zoo),coredata(soiltemp_T15_zoo)), 
      col = 5, lwd = 3)
abline(lm(coredata(soiltemp_T15_zoo) ~ index(soiltemp_T15_zoo)), col = 3)
legend("topright", legend = c("soiltemp_T15_Data", "Loess smoothing","Linear model"),
       cex = 1.8, col = c(8,5,3), lwd = c(1,3,1), bty = "n")
mtext(paste("adf.test p value =",signif(d0$p.value,3)), 
      side = 1, line = -1.2, adj = 0.05)
plot(diff(soiltemp_T15_zoo,1),
     ylab = "First differencing",
     xlab="", xaxt="n", col = 8)
abline(h = 0, col = "grey", lty = 2)
lines(loess.smooth(index(diff(soiltemp_T15_zoo,1)),coredata(diff(soiltemp_T15_zoo,1))), 
      col = 5, lwd = 3)
abline(lm(coredata(diff(soiltemp_T15_zoo,1)) ~
            index(diff(soiltemp_T15_zoo,1))), col = 3)
mtext(paste("adf.test p value =",signif(d1$p.value,3)), 
      side = 1, line = -1.2, adj = 0.05)
plot(diff(diff(soiltemp_T15_zoo,1),1),
     ylab = "Second differencing",
     xlab="Date", col = 8)
mtext("Date",side = 1, line = 2.2, font = 2)
abline(h = 0, col = "grey", lty = 2)
lines(loess.smooth(index(diff(diff(soiltemp_T15_zoo,1),1)),
                   coredata(diff(diff(soiltemp_T15_zoo,1),1))), 
      col = 5, lwd = 3)
abline(lm(coredata(diff(diff(soiltemp_T15_zoo,1))) ~ 
            index(diff(diff(soiltemp_T15_zoo,1)))), col = 3)
mtext(paste("adf.test p value =",signif(d2$p.value,3)), 
      side = 1, line = -1.2, adj = 0.05)
Figure 5: Plots of soil temperature time series data and its first and second differences, with each sub-plot showing the smoothed (Loess) and linear trends.

Figure 5: Plots of soil temperature time series data and its first and second differences, with each sub-plot showing the smoothed (Loess) and linear trends.

 

Finding the trend

A. Visualising a trend using a moving average

We create a new time series from a moving average for each 24h to remove daily periodicity using the rollmean() function from the zoo package. We know these are hourly data with diurnal fluctuation, so a rolling mean for chunks of length 24 should be OK, and this seems to be true based on Figure 6. In some cases the findfrequency() function from the xts package can detect the periodic frequency for us. A moving average will always smooth our data, so a smooth curve doesn't necessarily mean that we have periodicity (seasonality).

Moving averages are important parts of the ARIMA family of predictive models for time series, and that's why we show them here first.

There are possibly better ways of drawing smooth curves to represent our time series data... see below.

 

ff <- findfrequency(soiltemp_T15_xts) # often an approximation!! check the data!
cat("Estimated frequency is",ff,"\n")
## Estimated frequency is 24
soiltemp_T15_movAv <- rollmean(soiltemp_T15_zoo, 24)
plot(soiltemp_T15_zoo, col=8, type="l") # original data
# add the moving average
lines(soiltemp_T15_movAv, lwd = 2, col=2)
legend("topright", bty="n", legend = c("Data","Moving average, \u0192 = 24"), col = c(8,2), lwd = c(1,2))
Figure 6: Plots of soil temperature time series data and the 24-hour moving average (ƒ = frequency).

Figure 6: Plots of soil temperature time series data and the 24-hour moving average (ƒ = frequency).

 

B. Showing a trend using a linear (regression) model

This is just to visualize the trend. To find if a significant trend exists, we use the Mann-Kendall test or Sen's Slope test as described later.

First we create a linear model of the time series...

lm0 <- lm(coredata(soiltemp_T15_zoo) ~ index(soiltemp_T15_zoo))
summary(lm0)
## 
## Call:
## lm(formula = coredata(soiltemp_T15_zoo) ~ index(soiltemp_T15_zoo))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9217 -1.0968  0.0743  1.0938  3.5318 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              2.213e+03  3.370e+01   65.66   <2e-16 ***
## index(soiltemp_T15_zoo) -1.571e-06  2.409e-08  -65.22   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.402 on 1462 degrees of freedom
## Multiple R-squared:  0.7442, Adjusted R-squared:  0.744 
## F-statistic:  4253 on 1 and 1462 DF,  p-value: < 2.2e-16

...then use a plot (Figure 7) to look at the linear relationship.

soiltemp_T15_lmfit <- zoo(lm0$fitted.values, index(soiltemp_T15_zoo))
plot(soiltemp_T15_zoo, col = 8, type = "l")
lines(soiltemp_T15_lmfit, col = 2, lty = 2, lwd=2)
legend("topright", bty="n", legend = c("Data","Linear trend"), col = c(8,2), lty = c(1,2), lwd=c(1,2), seg.len=3)
Figure 7: Plots of soil temperature time series data and the overall trend shown by a linear model.

Figure 7: Plots of soil temperature time series data and the overall trend shown by a linear model.

 

Figure 7 shows that a linear model does not really capture the trend (even without considering the periodicity) very convincingly. The next option, 'loess' smoothing, shown below tries to represent the trend more accurately.

 

C. Using Locally Estimated Scatterplot Smoothing (loess)

loess, sometimes called 'lowess' is a form of locally weighted non-parametric regression to fit smooth curves to data. The amount of smoothing is controlled by the span = parameter in the loess.smooth() function (from base R) – see the example using the soil temperature data in Figure 8.

NOTE: loess smoothing is not a standard technique for time series smoothing, but we use it here for convenience. Loess smoothing does not have the relationship to ARIMA that moving averaging does, but does allow us to separate periodicity from random error in time series decomposition.
y_trend <- loess.smooth(index(soiltemp_T15_zoo), 
                           coredata(soiltemp_T15_zoo), 
                           span = 0.15, evaluation = length(soiltemp_T15_zoo))
plot(soiltemp_T15_zoo, col = 8, type = "l")
soiltemp_T15_trend <- zoo(y_trend$y, index(soiltemp_T15_zoo))
lines(soiltemp_T15_lmfit, col = 2, lty = 2)
lines(soiltemp_T15_trend, col = 3, lwd = 3)
legend("topright", bty = "n", inset = 0.02, cex = 1.25, 
       legend = c("actual data","linear model","loess smoothed"), 
       col = c(8,2,3), lty=c(1,2,1), lwd = c(1,1,3))
Figure 8: Plots of soil temperature time series data and the overall trend shown by a loess smoothing model.

Figure 8: Plots of soil temperature time series data and the overall trend shown by a loess smoothing model.

 

Isolating the Time Series Periodicity

NOTE THAT TIME SERIES DON'T ALWAYS HAVE PERIODICITY !

 
To model the periodicity it is useful to understand the autocorrelation structure of the time series. We can do this graphically, first by plotting the autocorrelation function acf() which outputs a plot by default:

acf(soiltemp_T15_xts, main = "")
Figure 9: The autocorrelation function plot for soil temperature time series data (the horizontal axis is in units of seconds).

Figure 9: The autocorrelation function plot for soil temperature time series data (the horizontal axis is in units of seconds).

 

What does Figure 9 tell you about autocorrelation in this time series?

Then plot the partial autocorrelation function (pacf())

pacf(soiltemp_T15_xts, main = "")
Figure 10: The partial autocorrelation function plot for soil temperature time series data (horizontal axis units are seconds).

Figure 10: The partial autocorrelation function plot for soil temperature time series data (horizontal axis units are seconds).

 

Interpreting partial autocorrelations (Figure 10) is more complicated – refer to Long & Teetor (2019, Section 14.15). Simplistically, partial autocorrelation allows us to identify which and how many autocorrelations will be needed to model the time series data.

 

Box-Pierce test for autocorrelation

The null hypothesis H0 for the Box-Pierce test (Box.test())is that no autocorrelation exists at any lag distance (so p ≤ 0.05 'rejects' H0):

Box.test(soiltemp_T15_xts)
## 
##  Box-Pierce test
## 
## data:  soiltemp_T15_xts
## X-squared = 1436.3, df = 1, p-value < 2.2e-16
Box.test(diff(soiltemp_T15_xts,1)) # 1 difference
## 
##  Box-Pierce test
## 
## data:  diff(soiltemp_T15_xts, 1)
## X-squared = 15.658, df = 1, p-value = 7.589e-05
Box.test(diff(diff(soiltemp_T15_xts,1),1)) # 2 differences
## 
##  Box-Pierce test
## 
## data:  diff(diff(soiltemp_T15_xts, 1), 1)
## X-squared = 357.45, df = 1, p-value < 2.2e-16

 

Make a time series of the loess residuals

Remember that we made a loess model of the time series ... the residuals (Figure 11) can give us the combination of periodicity component plus any random variation.

soiltemp_T15_periodic <- soiltemp_T15_zoo - soiltemp_T15_trend
plot(soiltemp_T15_trend, ylim = c(-2,20), lty = 3)
lines(soiltemp_T15_periodic, col = 6, lwd = 2) # just to check
legend("topright", legend = c("LOESS trend", "Periodicity plus noise"), 
       col = c(1,6), lwd = c(1,2), lty = c(3,1))
Figure 11: An incomplete decomposition of the soil temperature time series into smoothed trend and periodicity plus error (noise) components.

Figure 11: An incomplete decomposition of the soil temperature time series into smoothed trend and periodicity plus error (noise) components.

 

We can also use less smoothing in the loess function to retain periodicity; we adjust the span = option (lower values of span give less smoothing; we just need to experiment with different values). The difference between the data and the less-smoothed loess should be just 'noise' or 'error'.

We first generate a loess model which is stored in the object temp_LOESS2:

temp_LOESS2 <- loess.smooth(index(soiltemp_T15_zoo), 
                           coredata(soiltemp_T15_zoo), 
                           span = 0.012, evaluation = length(soiltemp_T15_zoo))

We then use the new loess model to make a time series which contains both periodic and trend information:

soiltemp_T15_LOESS2 <- zoo(temp_LOESS2$y, index(soiltemp_T15_zoo))

The difference between the data and the less-smoothed loess should be just 'noise' or 'error', so we make a new time series based on this difference:

soiltemp_T15_err <- soiltemp_T15_zoo - soiltemp_T15_LOESS2

 

Then plot, setting y axis limits to similar scale to original data:

plot(soiltemp_T15_trend, ylim = c(-2,20), lty = 2, lwd = 2, # from above
     xlab = "2014 Date", ylab = "Soil temperature (\u00B0C)")
lines(soiltemp_T15_LOESS2, col = 3, lwd = 2)
lines(soiltemp_T15_err, col = 5) # from a couple of lines above
legend("left", 
       legend = c("Trend (coarse LOESS)", 
                  "Periodicity + trend (fine LOESS)",
                  "Unexplained variation"), 
       col = c(1,3,5), lwd = c(2,2,1), lty = c(2,1,1), bty="n")
Figure 12: Incomplete decomposition of soil temperature time series data showing the smoothed *trend*, combined *trend + periodicity*, and *error* components.

Figure 12: Incomplete decomposition of soil temperature time series data showing the smoothed trend, combined trend + periodicity, and error components.

 

We're now in a position to estimate homoscedasticity of our time series. It's a bit ‘hacky’, but we can predict the raw time series from the smoothed trend+periodicity, then apply the Breusch-Pagan test to our model (we need the lmtest package loaded earlier for the bptest() function):

lm0 <- lm(coredata(soiltemp_T15_zoo) ~ coredata(soiltemp_T15_LOESS2))
bptest(lm0) ; rm(lm0)
## 
##  studentized Breusch-Pagan test
## 
## data:  lm0
## BP = 0.087412, df = 1, p-value = 0.7675

The bptest() p-value is > 0.05 so, by this estimate, our time series is homoscedastic.

The periodicity by itself should be represented by the difference between the very smoothed (trend) and less smoothed (trend + periodicity) loess, as suggested by the two loess curves in Figure 12. We now make yet another time series object to hold this isolated periodicity:

soiltemp_T15_periodic2 <- soiltemp_T15_LOESS2 - soiltemp_T15_trend

 

Show the components of time series decomposition

plot(cbind(soiltemp_T15_zoo,soiltemp_T15_periodic2,
     soiltemp_T15_trend, soiltemp_T15_err), main = "", 
     xlab = "Date in 2014", 
     cex.main = 1.5, yax.flip = TRUE, col = c(8,3,4,2), 
     ylim = c(floor(min(soiltemp_T15_periodic2)),
              ceiling(max(soiltemp_T15_zoo))))
x1 <- (0.5*(par("usr")[2]-par("usr")[1]))+par("usr")[1]
y <- (c(0.18, 0.3, 0.6, 0.82)*(par("usr")[4]-par("usr")[3]))+par("usr")[3]
text(rep(x1,4), y, 
     labels = c("Unaccounted variation","Trend","Apparent periodicity","Data"), 
     col = c(2,4,3,8))
Figure 13: Time series decomposition into the three main components (periodicity, trend, and error; the raw data are at the top) for soil temperature (°C) at 15 cm depth.

Figure 13: Time series decomposition into the three main components (periodicity, trend, and error; the raw data are at the top) for soil temperature (°C) at 15 cm depth.

 

Figure 13 shows the original time series data, and the three important components of the decomposed time series: the periodicity, the trend, and the error or unexplained variation. We should remember that:

  • this time series was relatively easy to decompose into its components, but not all time series will be so 'well-behaved';
  • there are different ways in which we can describe the trend and periodicity (we have just used the loess smoothing functions for clear visualization).

This ends our exploratory data analysis of time series, which leads us into time series models,
including trend and ARIMA forecast modelling.

 


 

References

Australian Government (2023). Assessment of change through monitoring data analysis, In Australian & New Zealand Guidelines for Fresh and Marine Water Quality, www.waterquality.gov.au/anz-guidelines/monitoring/data-analysis/data-assess-change.

Department of Water (2015). Calculating trends in nutrient data, Government of Western Australia, Perth. www.wa.gov.au/.../calculating-trends-nutrient-data.

Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O'Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2022). forecastForecasting functions for time series and linear models. R package version 8.17.0, http://pkg.robjhyndman.com/forecast/.

Long, J.D., Teetor, P., 2019. Time series analysis. Chapter 14, The R Cookbook, Second Edition https://rc2e.com/timeseriesanalysis.

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Ryan, J.A. and Ulrich, J.M. (2020). xts: eXtensible Time Series. R package version 0.12.1. https://CRAN.R-project.org/package=xts

Zeileis, A. and Grothendieck, G. (2005). zoo: S3 Infrastructure for Regular and Irregular Time Series. Journal of Statistical Software, 14(6), 1-27. doi:10.18637/jss.v014.i06


CC-BY-SA • All content by Ratey-AtUWA. My employer does not necessarily know about or endorse the content of this website.
Created with rmarkdown in RStudio. Currently using the free yeti theme from Bootswatch.