Singular Value Decomposition
Singular value decomposition (SVD) is quite possibly the most widely-used multivariate
statistical technique used in the atmospheric sciences.
The technique was first introduced to meteorology in a 1956 paper by Edward Lorenz,
in which he referred to the process as empirical orthogonal function (EOF) analysis.
Today, it is also commonly known as principal-component analysis (PCA). All three
names are still used, and refer to the same set of procedures within the Data Library.
The purpose of singular value decomposition is to reduce a dataset containing a large
number of values to a dataset containing significantly fewer values,
but which still contains a large fraction of the variability present in the original
data.
Often in the atmospheric and geophysical sciences, data will exhibit large spatial
correlations. SVD analysis results in a more compact representation
of these correlations, especially with multivariate datasets and
can provide insight into spatial and temporal variations exhibited in the fields of
data being analyzed.
There are a few caveats one should be aware of before computing the SVD of a set of
data. First, the data must consist of anomalies. Secondly, the data should be de-trended.
When trends in the data exist over time, the first structure often captures them.
If the purpose of the analysis is to find spatial correlations independent of trends,
the
data should be de-trended before applying SVD analysis.
Analysis of Singular Value Decomposition
The first structure is the single pattern that represents the most variance in the
data.
The structures are the elements of the eigenvectors of the variance-covariance matrix
of the data.
In the Data Library, the eigenvectors are also known as EOF's. The first eigenvector
(EOF) points to the direction in which the data vectors jointly exhibit the most variability.
Essentially, a new coordinate system is created, with each axis aligned along the
direction of maximum joint variability.
The second structure is the pattern that describes the second largest amount of variance,
calculated the same way as the first structure. A very important property of the second
structure is that it is completely uncorrelated with the first structure, as well
as all other following structures.
The second eigenvector is perpendicular to the first eigenvector, which is perpendicular
to the third eigenvector and so on. This property is what led Lorenz to call the
technique empirical orthogonal function analysis.
All structures are mutually uncorrelated.
The variance of the nth principal component is the nth eigenvalue.
Therefore, the total variation exhibited by the data is equal to the sum of all eigenvalues.
In the Data Library, eigenvalues are normalized such that the sum of all eigenvalues
equals 1.
A normalized eigenvalue will indicate the percentage of total variance explained by
its corresponding structure.
Structures have also been normalized so that the root mean square equals 1. This
way, the structures can be expressed in terms of standard deviation.
Singular values are equal to the square root of the eigenvalues. Since eigenvalues
are automatically normalized in the Data Library, they do not easily provide
information into the total amount of variance they explain.
However, you may calculate the total variance explained by each EOF by squaring the
singular values.
In the Data Library there is a time series associated with each structure. These time
series are also known as principal components.
The first time series is calculated by projecting the data matrix onto the first eigenvector
of the variance-covariance matrix of the data, the
second time series by projecting onto the second eigenvector, and so on.
The time series values indicate the amount of the given structure needed to complete
the data field.
It follows that the structure (dimensionless) multiplied by the time series value
at a single point in time (units of the data),
summed over all structures, yields the original data at that point in time.
Mathematically, there are as many eigenvectors as there are elements in the vector
data set.
The first few eigenvectors will point in directions where the data jointly exhibits
large variation.
The remaining eigenvectors will point to directions where the data jointly exhibits
less variation.
For this reason, it is often possible to capture most of the variation by considering
only the first few eigenvectors.
The remaining eigenvectors, along with their corresponding principal components, are
truncated.
The ability of SVD to eliminate a large proportion of the data is a primary reason
for its use.
Outline of Key Points
- Datasets must consist of anomalies.
- Better results when applied to de-trended data.
- As many eigenvectors as temporal data values in the set.
- Eigenvectors point in the direction of maximum joint variability.
- Eigenvalues represent the amount of variance explained by the corresponding structure.
- First eigenvalue will account for the most variation.
- All but first few structures may be truncated in most cases.
- All principal components mutually uncorrelated.
Example: SVD Analysis of North Atlantic Sea Surface Temperature Anomalies
Example: Perform a singular value decomposition of reconstructed sea surface temperature
anomaly data in the North Atlantic for the months of December, January, and February
from 1870 to 2004.
Locate Dataset and Variable |
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Air-Sea Interface" link.
- Select the
ERSST dataset.
- Scroll down the page and select the "version2" link under the Datasets and Variables
subheading.
- Select the "Sea Surface Temperature" link again under the Datasets and Variables subheading.
CHECK
|
Compute Monthly Anomalies |
- Click on the "Filters" link in the function bar.
- Choose the anomalies command. CHECK EXPERT
This operation calculates the SST anomalies for each month.
|
Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 10N to 70N, 5W to 80W, and Dec-Feb 1870-2004 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
The time range entered will select only December, January, and February values for
each year.
|
Compute Singular Value Decomposition |
- Click on the "Expert Mode" link in the function bar
- Enter the following line under the text already there:
{Y cosd} [X Y] [T] svd
- Press the OK button. CHECK
The svd function computes the singular value
decomposition of the SST dataset weighted over the cosine of the latitude. Often,
spatial
data will be weighted over the cosine of the latitude to account for area changes
between meridians at varying latitudes. A weight term, however, is not necessary
to complete the SVD analysis.
Five new variables appear under the Datasets and Variables subheading: normalized
eigenvalues, structures, singular values, time series, and weights.
While all of the variables are associated with the same new coordinate system generated
by the SVD, each contain a different piece of information about the system.
|
View Normalized Eigenvalues |
- Click on the "normalized eigenvalues" link under the Datasets and Variables subheading.
CHECK
- Select the time series viewer in the function bar. CHECK
Normalized Eigenvalues vs. Eigenvectors of SVD SST Anomalies
Notice the speed in which this function decays. The eigenvalues associated with the
first few eigenvectors are much larger than the eigenvalues associated with subsequent
eigenvectors.
As mentioned earlier, the first few eigenvalues account for most of the variation
present in the original data.
-
Click on the right-most link in the blue source bar to exit the viewer.
- Select the "Tables" link in the function bar.
- Select the columnar table link. CHECK
The first normalized eigenvalue is .233, the second eigenvalue is .151, and the third
eigenvalue is .139. Recall that normalized eigenvalues represent the fraction of
variance explained
by the structure associated with that eigenvalue. Therefore, the first structure
explains 23% of the variance, the second structure 15%, and so on. Looking at the
table, there are 402 structures. Yet, the first
three structures account for over 50% of the variance.
|
Return to Dataset Page |
- Select the "Additional Information" link at the top of the page to exit the table.
-
In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK
This will remove the normalized eigenvector variable selection and return you to the
SVD page.
|
View Structures |
- Click on the "structures" link under the Datasets and Variables subheading. CHECK
- In the function bar, select the viewer with land shaded in black. CHECK
1st Structure of SVD SST Anomalies
This is an image of the 1st structure, which explains 23.2% of the total variance
present in the original data.
Recall that the structures have been normalized, and as a result, are unitless quantities.
Note the large negative values off the coast of West Africa. This variability is
caused by an ocean-atmosphere coupling system described in the third example.
- In the text box above the viewer window, enter the number 2.
- Press the redraw button. CHECK
2nd Structure of SVD SST Anomalies
This is an image of the second structure, which explains 15% of the total variance
present in the original data. Notice the large negative values off the east coast
of the United States that extend into the Central Atlantic.
These large values may be produced, in part, by the Gulf Stream current, which causes
annual variability of SST's in the region. An image of the gulf stream current is
provided below. The large values present in the 2nd EOF structure above and the vectors
that represent the gulf stream current in the image below appear to overlap.
This region is also aligned with the jet stream, a narrow area where weather patterns
move off the coast and cause additional variability in SST's.
The large values in the 2nd structure may also be caused by an atmospheric circulation
pattern known as the North Atlantic Oscillation.
The Gulf Stream Current
Gyory, Joanna. The Gulf Stream. http://oceancurrents.rsmas.miami.edu/atlantic/gulf-stream.html.
|
Return to Dataset Page |
- Click on the right-most link in the blue source bar to exit the viewer.
- In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK
This will remove the structures variable selection and return you to the SVD page.
|
View Time Series |
- Click on the "time series" link under the Datasets and Variables subheading.
CHECK
- Select the time series viewer. CHECK
Time Series of SVD SST Anomalies
There is a time series associated with each eigenvector/structure. This is the time
series corresponding to the 1st eigenvector, but you may change the eigenvector by
changing
the number in the text box above the viewer.
The time series illustrates the amount of the structure present in the data, or in
other words,
the amount of the structure needed to complete the data field at each time step.
These time series can be correlated with time series
and/or indices relating to other processes in order to demonstrate a relationship.
*NOTE: The singular values variable can be accessed the same way as the other three
variables shown above.
|
Example: SVD Analysis of North Atlantic Mean Sea Level Pressure Anomalies and Their
Relation to the NAO
Example: Perform a singular value decomposition analysis of mean sea level pressure anomaly
data in the North Atlantic for the months of December, January, and February from
1950 to 2004.
Locate Dataset and Variable |
- Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Historical Model Simulations" link.
- Select the
NOAA NCEP-NCAR CDAS-1 dataset.
- Scroll down the page and select the "MONTHLY" link under the Datasets and Variables
subheading.
- Select the "Intrinsic" link again under the Datasets and Variables subheading.
- Select the "Mean Sea Level" link again under the Datasets and Variables subheading.
- Select the "Pressure" link again under the Datasets and Variables subheading. CHECK
|
Compute Monthly Anomalies |
- Click on the "Filters" link in the function bar.
- Choose the anomalies command. CHECK EXPERT
This operation calculates the mean sea level pressure anomalies for each month.
|
Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 5W to 80W, 10N to 70N, and Dec-Feb 1950-2004 in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
The time range entered will select only December, January, and February values for
each year.
|
Compute Singular Value Decomposition |
- Again in Expert Mode, enter the following line under the text already there:
{Y cosd} [X Y] [T] svd
- Press the OK button. CHECK
The svd function computes the singular value decomposition of the mean sea level pressure
dataset weighted over the cosine of the latitude.
|
Find Eigenvalue of 1st Structure |
- Click on the "normalized eigenvalues" link under the Datasets and Variables subheading.
CHECK
- Select the "Tables" link in the function bar.
- Select the columnar table link. CHECK
The first normalized eigenvalue is .402, the second eigenvalue is .278, and the third
eigenvalue is .100. Normalized eigenvalues represent the fraction of varience explained
by the structure associated with that eigenvalue. In this example, we will only be
concerned with the first eigenvalue, which explains 40.2% of the total variance.
|
Return to Dataset Page |
- Select the "Additional Information" link at the top of the page to exit the table.
-
In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK
This will remove the normalized eigenvector variable selection and return you to the
SVD page.
|
View 1st Structure |
- Click on the "structures" link under the Datasets and Variables subheading. CHECK
- In the function bar, select the viewer with land shaded in black. CHECK
1st Structure of SVD MSLP Anomalies
This is an image of the first structure, which explains 40.2% of the total variance
present
in the original data. The large positive values centered around 45° N and the
large negative values centered around 65° N are indicative of two regions whose
mean sea level pressures are generally inversely related. This system is a well known
low-frequency atmospheric circulation pattern called the North Atlantic Oscillation.
The
NAO is characterized by large-scale MSLP variablity associated with a subtropical
high /
polar low system over the Northern Atlantic. During a postive NAO, the subtropical
high is
stronger than usual and the polar low is deeper than usual. The increased pressure
gradient
causes stronger winter storms to cross over the Atlantic. During a negative NAO,
the
subtropical high and polar low are both weaker than usual, resulting in fewer / less
severe
storms crossing the Atlantic.
|
Example: Correlation of a SVD Time Series of Mean Sea Level Pressure Anomalies with
a SVD Time Series of SST Anomalies in the North Atlantic.
Example: Correlate a SVD time series of mean sea level pressure anomalies with a SVD time
series of SST anomalies in the North Atlantic for the months of December, January,
and February.
Select Dataset, Variable, and Domains |
*NOTE: Datasets used in the example are similar to those used in the previous two
examples.
|
Compute Singular Value Decomposition |
- In Expert Mode, enter the following line under the text already there:
{Y cosd} [X Y] [T] svd
- Press the OK button. CHECK
The svd function computes the singular value decomposition of the mean sea level pressure
dataset weighted over the cosine of the latitude.
|
Select Time Series Variable and 1st Eigenvector |
- Click the "Time Series" variable under the Datasets and Variables subheading. CHECK
- Click on the "Data Selection" link in the function bar.
- Enter the number 1 in the ev text box.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
You have selected the first eigenvector, and its associated time series.
|
Add the Second Structure SVD Time Series of Reconstructed SST Anomaly Data. |
|
Correlate Datasets |
- In Expert Mode, enter the following line under the text already there:
[T] correlate
- Press the OK button. CHECK
The above command correlates the two sets of data. The correlation coefficient is
located under the Expert Mode text box in bold: 0.249616.
We can conclude there is a slight correlation between MSLP anomalies and SST anomalies
in the North Atlantic.
The correlation coefficient is not very high because correlations between the 1st
SST anomaly strucuture, for example, can be found in multiple MSLP anomaly structures.
SVD analyses of the MSLP and SST datasets are independent of each other.
There is no guarantee that the maximum amount of association between two variables
will be found in two distinct principal component analysis time series.
However, it has been proven that there is a relationship between these two datasets,
specifically between these two structures. Atmospheric anomalies do cause SST anomalies,
and vice versa. In this example, changes in MSLP sometimes cause an anomalous atmospheric
cyclonic circulation
centered around 40° W and 30° N. The cyclone weakens the normal northerly winds off
the west coast of Africa. As a result, coastal upwelling is reduced and positive
SST anomalies occur. Scroll up the page to the first EOF structure in the first example.
Notice the
extremely low values off the coast of West Africa. This SST variability is associated
with variations in MSLP that produce the anomalous low.
|
Disadvantages of Unrotated Singular Value Decomposition
Unrotated emperical orthogonal functions (EOFs) are often very useful to describe
natural modes of variability in a data field, due to their spatial and temporal orthogonality,
ability to extract the maximum variance from a field, and relative
simplicity.
Yet, unrotated emperical orthogonal functions generally do a poor job of isolating
individual modes of variation.
This weakness is largely due to four inherent characteristcs of unrotated EOFs: domain
shape dependence, subdomain instability, sensitivity to sampling, and an inaccurate
portrayal of the physical relationships embedded within the input data (Richman 1986).
- Domain Shape Dependence
Unrotated EOFs can be primarily determined by the shape of the domain rather than
by the covariation of the data.
In these cases, structures of the unrotated EOF analysis do not resemble any of the
single input patterns, but rather, they represent combinations of the input patterns.
- Subdomain Instability
Unrotated EOFs usually exhibit poor subdomain stability, where subdomain instability
refers to the stability of the modal patterns as sub-portions of the domain.
Richman and Lamb (1985) did a study where unrotated EOF analyses were performed on
the same set of data, once over an entire domain and once over the northern and southern
halves of the domain separately.
The results for each half of the domain did not correspond with the results of the
entire domain, which leads to the question: How robust are the results from an unrotated
EOF?
- Sensitivity to Sampling
When eigenvalues are close together, they may be dominated by noise and the corresponding
EOFs may not be well defined.
- Lack of Physical Meaning
Unrotated EOFs sometimes produce results that are not physically meaningful.
Rotated Singular Value Decomposition
In a rotated EOF analysis, the eigenvectors are weighted by the square root of their
corresponding eigenvalues, so that the weights (i.e., loadings) represent
the correlations between each variable and principal component. Most rotations are
simple expressions which approximate a simple structure through the application
of mathematical algorithms which distribute the PC loadings such that the dispersion
of the loadings is maximized.
Varimax rotation is the most widely accepted method for analytical rotation. The
Varimax method reduces variances of the projection of the data onto the rotated basis,
where the projection is the principal component time series.
This improves the alignment of the basis with the actual data and improves the relationship
between their spatial and temporal patterns and known physical mechanisms. Varimax
is a method for rotating the axes of a plot such that the eigenvectors remain orthogonal
as they are rotated.
These rotations are used in principal component analysis so that the axes are rotated
to a
position in which the sum of the variances of the loadings is the maximum possible
(Oilfield Glossary).
In the Data Library, the varimax function requires the user to specify the number
of eigenvectors to use in the rotation.
The matrix of loadings is determined by the truncated eigenvectors.
Many atmospheric scientists argue that rotated EOF analysis is a more effective tool
than unrotated EOF analysis for the study of atmospheric circulation patterns.
While EOF rotation is often very useful, it is not meant to be a default operation
after every EOF analysis.
The application of actual EOFs should be guided by the specific analysis.
Advantages / Disadvantages of Varimax Rotation
Advantages
- Less affected by domain dependence than unrotated EOF analyses.
- Varimax analyses of subdomains more stable than unrotated EOF analyses.
- When neighboring eigenvalues are similar in value, patterns not present in the unrotated
EOFs may become present after rotation.
- Eigenvectors still remain orthogonal as they are rotated.
- Generally exhibits a stronger relationship between components and known physical mechanisms
than unrotated EOFs.
- Rotated EOFs often in better agreement with physical patterns than unrotated EOFs.
Disadvantages
- More complex than unrotated EOFs.
- Sometimes difficult to determine when rotation is useful.
- Not applicable to cases where sole purpose of EOF analysis is data reduction.
- In some cases, will not increase the physical explainablity of the data (may cause
more harm than good).
Varimax Rotation of East Pacific Sea Surface Temperature Data
Example: Perform a varimax rotation of an SVD analysis of East Pacific sea surface temperatures.
Locate Dataset and Variable |
-
Select the "Datasets by Catagory" link in the blue banner on the Data Library page.
- Click on the "Air-Sea Interface" link.
- Scroll down the page and select the
NOAA NCEP EMC CMB GLOBAL Reyn_Smith dataset.
- Click on the "Reyn_SmithOIv2" link.
- Click on the "monthly" link.
- Click on the "Sea Surface Temperature Anomaly" link under the Datasets and Variables
subheading.
CHECK
|
Select Temporal and Spatial Domains |
- Click on the "Data Selection" link in the function bar.
- Enter the text 180W to 70W and 35S to 35N in the appropriate text boxes.
- Press the Restrict Ranges button and then the Stop Selecting button.
CHECK
|
Compute Singular Value Decomposition |
- Click on the "Expert Mode" link in the function bar
- Enter the following line under the text already there:
{Y cosd} [X Y] [T] svd
- Press the OK button. CHECK
The svd function computes the singular value decomposition of the SST dataset weighted over
the cosine of the latitude.
Five new variables appear under the Datasets and Variables subheading: normalized
eigenvalues, structures, singular values, time series, and weights.
|
View Structures |
- Click on the "structures" link under the Datasets and Variables subheading. CHECK
- In the function bar, select the viewer with land shaded in black. CHECK
1st Structure of SVD SST Anomalies
The first structure is representative of the El Niño Southern Oscillation.
Recall that the first structure is the pattern that explains the most variability
in the
original set of data.
The relatively large positive values located immediately off the west coast of South
America
correspond to the
variability in SSTs caused by upwelling during La Niño years and the lack of
upwelling during El Niño years.
Notice that these values extend westward in a narrow line, and as a result, do not
cover much surface area in the Pacific.
However, ENSO generally effects a greater area than depicted by this first structure.
One explanation is that part of the ENSO pattern might be contained
in another strucuture, or multiple structures.
|
Return to Dataset Page |
- Click on the right-most link in the blue source bar to exit the viewer.
- In the source bar, click on the { Y cosd } [ X Y ] [ T ] svd link. CHECK
This will remove the structures variable selection and return you to the SVD page.
|
Perform Varimax Rotation |
- Click on the Expert Mode link in the function bar.
- Enter the following line under the text already there:
3 varimax
- Press the OK button. CHECK
The varimax function above performs a varimax rotation using the first three eigenvectors. Changing
the number before the varimax command will change the number of eigenvalues
to be entered into the function. Seven new variables appear under the Datasets and
Variables subheading: varimax rotation, communalities, energy, rotated structures,
singular values, time series, and weights.
|
Select Rotated Structures Variable |
- Click on the "rotated structures" link under the Datasets and Variables subheading.
CHECK
|
View Structures |
- In the function bar, select the viewer with land shaded in black. CHECK
1st Structure of SVD Varimax Rotated SST Anomalies
Notice that the colorscale is not centered around 0°. To enhance the interpretability
of the image, the colormap can be adjusted so that
the scale is centered around 0°.
|
Return to Dataset Page |
- Click on the right-most link in the blue source bar to exit the viewer. CHECK
|
Generate Colormap |
|
View Structures |
- In the function bar, select the viewer with land shaded in black. CHECK
1st Structure of SVD Varimax Rotated SST Anomalies
By rotating the first three eigenvectors via the varimax method, the resulting structure
is more representative of the
physical pattern (ENSO) than the unrotated EOF structure illustrated earlier in the
example. Pieces of the ENSO pattern contained in the multiple unrotated principal
components have been incorporated
into one rotated component. The negative values now extend farther north and south,
as
well as to the west. Many times, rotating the EOFs / PCs will result in a solution
that better explains the underlying physical patterns in the input data.
|