Singular Spectrum Analysis from Scratch in R
Singular spectrum analysis (SSA) is a powerful tool for analyzing time-series data. It leverages the singular value decomposition (SVD) to “separate” individual components of the time series that can be difficult to visualize with the human eye. In this post I will illustrate how to apply basic SSA to a one-dimensional time series composed of average monthly minimum temperature between January 1961 and December 2015 for the city of Sokodé, Togo. We plot the time series below:

Figure 1: Sokode Minimum Temperatures
There is a clear increasing trend in the average monthly minimum temperatures for Sokodé. However, there are a great amount of fluctuations within each year. These fluctuations are referred to as seasonality, or sometimes cycles when the period is longer than a year. We will use SSA to help separate the trend from seasonality and to uncover the different types of seasonality that are present.
There are 4 main steps when performing SSA:
- Embedding
- Decomposition (using SVD)
- Eigentriple grouping
- Reconstruction
Embedding
Embedding describes the process of setting up the trajectory matrix. We do this be selecting a window size
Once we have selected
In R this is fairly simple to do with the sapply() function:
temp <- sokode_min$Temperature
N <- length(temp)
L <- 240
K <- N - L + 1
d <- 1:L
# Create the trajectory matrix
a <- sapply(1:K, \(i) temp[i:(i + L - 1)])
A <- matrix(a, nrow = L, ncol = K)
Decomposition
The next step is to apply SVD to the trajectory matrix. In other words, we wish to decompose
In R we can use the svd() function:
A_decomposed <- svd(A)
U <- A_decomposed$u
V <- A_decomposed$v
D <- diag(A_decomposed$d)
One helpful thing to do after the decomposition is to view a scree plot of the singular values.
# Plot the singular values
scree_plot(D, trend = TRUE) # with trend

Figure 2: Scree Plot
Further,
We will store these matrices in a list called X.
# Obtain the individual components of the SVD (no grouping)
X <- list()
for(i in d){
X[[i]] <- D[i,i] * U[,i,drop = FALSE] %*% t(V)[i,,drop=FALSE]
}
Here
Eigentriple Grouping
At this point we have a collection of
This notation is only particularly meaningful if the
plot_left_singular_vectors(U, D, n_vectors = 16, trend = TRUE) # with trend

Figure 3: Left Singular Vectors
Typically, the long-term trend will have the greatest influence on the time series, and thus will be the first singular vector. Once we move past this, however, we begin to see “wavy” series. These correspond to seasonal effects, and typically come in pairs. The percentages displayed are calculated as
Left singular values that have long-term patterns might be attributed to trend, while vectors with many oscillations belong in seasonality. However it can sometimes be difficult to tell where noise starts. Another useful diagnostic tool is plotting consecutive pairs of left singular vectors. We only plot consecutive pairs because matching seasonal components often have very similar singular values and are positioned next to each other.
plot_left_singular_vectors(U, n_vectors = 12, type = 'paired') # without trend

Figure 4: Paired Left Singular Vectors
In these plots, we look for n-vertex polygons. When two seasonal components “line up” they form such a shape, where the number of vertices of the polygon corresponds to the period of the sinusoidal oscillation. For more information on why this happens, check out this great resource on SSA.
These plots help with delineating between seasonality and noise because the n-vertex polygons will not be as clear in noisy patterns. Here we clearly have a 12 month period defined by the second and third singular vectors, a 6 month period defined by the fourth and fifth, and a 4 month period defined by the sixth and seventh. The 8th and ninth vectors might combine to form a long-term cycle, while the eleventh and twelfth might form a very short-term season.
At some point, these components are not adding much signal and can probably be lumped in with noise. Going back to the scree plot, it might make since to cut things off after the ninth eigentriple. But the tenth vector looks as though it may include some additional long-term trend, and it might be interesting to see what happens when we include the eleventh and twelfth vectors into the seasonality component, so we will try that.
Reconstruction
To recap the previous section, we have decided on the following groups of the eigentriples:
# Split eigentriples according to grouping analysis
trend_id <- c(1, 10) #c(1,8)
season_id <- c(2:9, 11:12) # c(2:3, 4:5, 6:6, 9:10)
noise_id <- d[!(d %in% c(trend_id, season_id))]
# Create vector indicating which group each component belongs to
groups <- numeric(L)
groups[trend_id] <- 1; groups[season_id] <- 2; groups[noise_id] <- 3
For the reconstruction step, we apply diagonal averaging within each of
# Diagonal average Hankel
hankelize <- function(X){
n_rows <- nrow(X)
n_cols <- ncol(X)
N <- n_rows + n_cols - 1
y_sum <- numeric(N)
y_n <- numeric(N)
for(i in 1:n_rows){
for(j in 1:n_cols){
y_sum[i + j - 1] <- y_sum[i + j - 1] + X[i,j]
y_n[i + j - 1] <- y_n[i + j - 1] + 1
}
}
return(y_sum / y_n)
}
# Perform the reconstruction with diagonal averaging
Y_k <- lapply(unique(groups), \(j) hankelize(Reduce('+', X[groups == j])))
data.frame(Index = 1:N,
Value = do.call(c, Y_k),
Type = rep(c('Trend','Seasonality','Noise'), each = N)) |>
ggplot(aes(x = Index, y = Value, color = Type)) +
geom_line(show.legend = FALSE) +
theme_bw() +
facet_wrap(~factor(Type, levels = c('Trend','Seasonality','Noise')),
scales = 'free', ncol = 1) +
labs(y = 'Reconstructed Value')

Figure 5: Reconstructed Components
Post reconstruction, we see that there is a relatively linear increasing trend in time. There are several seasonal oscillations represented in the seasonality component, adn the “noise” appears quite random. We have completed SSA!
Prediction
As the late and great Billy Mays once said, “but wait, there’s more!”
So far we have seen how powerful SSA is when it comes to decomposing time-series data. But wouldn’t it be great if we could use this to forecast into the future? The good news is we can. There are many ways to make forecasts using SSA, but a simple method is using a linear recurrent relation. An order
In the following code snippet, we set
# LRR
len <- 100
preds <- list()
for(i in unique(groups)){
# Compute LRR parameters
idx <- L
lpf <- U[, groups == i, drop = FALSE] %*% t(U[idx, groups == i, drop = FALSE])
params <- lpf[-idx] / (1 - lpf[idx])
preds[[i]] <- c(Y_k[[i]], rep(NA, len))
for(j in 1:len){
preds[[i]][N + j] <- sum(preds[[i]][(N+j-(L-1)):(N+j-1)] * params)
}
preds[[i]] <- preds[[i]][(N+1):(N+len)]
}
data.frame(Index = 1:(N+100),
obs = c(temp, rep(NA, 100)),
Trend = c(Y_k[[1]], preds[[1]])) |>
mutate(Seasonality = Trend + c(Y_k[[2]], preds[[2]])) |>
pivot_longer(cols = Trend:Seasonality, names_to = 'Component', values_to = 'Value') |>
mutate(Type = ifelse(Index > N, 'Predicted', 'Model')) |>
ggplot(aes(x = Index)) +
geom_point(aes(y = obs)) +
geom_line(aes(y = Value, color = Component, lty = Type)) +
scale_color_manual(values = scales::hue_pal()(3)[2:3]) +
theme_bw() +
theme(legend.position = 'top') +
labs(x = 'Index',
y = 'Temperature',
color = 'Component',
lty = 'Type')

Figure 6: Recurrent Forecasts
It would appear that our predictions are quite reasonable! Linear recurrence relations certainly aren’t the end-all be-all, but they do provide a quick and dirty way to obtain useful forecasts following the reconstruction step in SSA.
I hope this short introduction to singular spectrum analysis was helpful to you! Here are some more resources to keep you going:
- Singular Spectrum Analysis: Methodology and Comparison
- Singular Spectrum Analysis with R
- Rssa: A Collection of Methods for Singular Spectrum Analysis
Plotting Functions
If you are interested in the functions used to create the scree and left singular value plots, look no more.
scree_plot <- function(D, n = min(50, nrow(D)), trend = FALSE){
d <- diag(D)
if(!trend) d <- d[2:(n+1)]
else d <- d[1:n]
data.frame(d) |>
mutate(Index = row_number()) |>
ggplot(aes(x = Index, y = log(d))) +
geom_point() +
geom_line() +
theme_bw() +
labs(y = 'Log Singular Value')
}
plot_left_singular_vectors <- function(U, n_vectors = 12, type = 'original',
trend = FALSE){
if(!trend) cols_want <- 2:(n_vectors+1) #U <- U[,-1]
else cols_want <- 1:n_vectors
firsts <- U |>
as.data.frame() |>
select(!!!vars(cols_want)) |>
mutate(Index = row_number()) |>
pivot_longer(cols = -Index, names_to = 'Vector1', values_to = 'Value1')
firsts$Vector1 <- factor(firsts$Vector1, levels = unique(firsts$Vector1))
if(type == 'original'){
firsts$Vector1 <- gsub('V', 'U', firsts$Vector1)
firsts$Vector1 <- factor(firsts$Vector1, levels = unique(firsts$Vector1))
firsts |>
ggplot(aes(x = Index, y = Value1)) +
geom_line() +
facet_wrap(~Vector1, scales = 'free') +
theme_bw() +
labs(y = 'Vector')
} else if(type == 'paired'){
seconds <- U |>
as.data.frame() |>
select(!!!vars(cols_want)) |>
mutate(Index = row_number()) |>
pivot_longer(cols = -Index, names_to = 'Vector2', values_to = 'Value2')
seconds$Vector2 <- factor(seconds$Vector2, levels = unique(seconds$Vector2))
pairs <- firsts |>
left_join(seconds, by = join_by(Index), relationship = 'many-to-many') |>
filter(as.numeric(Vector1) == (as.numeric(Vector2) - 1)) |>
mutate(VectorPair = paste0(Vector1, ' & ', Vector2),
VectorPair = gsub('V', 'U', VectorPair))
pairs$VectorPair <- factor(pairs$VectorPair, levels = unique(pairs$VectorPair))
pairs |>
ggplot(aes(x = Value1, y = Value2)) +
geom_path() +
facet_wrap(~VectorPair, scales = 'free') +
theme_bw() +
labs(x = 'Vector 1', y = 'Vector 2')
}
}