6 Extreme value theory ✏️
See first
This chapter builds on concepts from Probability and Statistics.
Learning objectives
After reading this chapter, you should be able to:
- Understand the motivation for extreme value theory in climate risk assessment.
- Distinguish between block maxima and peak-over-threshold approaches.
- Select and fit appropriate extreme value distributions (GEV, GPD).
- Calculate return periods, return levels, and exceedance probabilities.
- Quantify and interpret different sources of uncertainty in extreme value analysis.
- Recognize the challenges posed by non-stationarity and climate change.
- Apply extreme value methods to real-world case studies.
6.1 Motivation
The design of reliable infrastructure, such as stormwater systems in Harris County, Texas, depends on a quantitative understanding of extreme environmental events. Engineers must characterize the magnitude and frequency of rare phenomena. For example, they might need to determine the expected recurrence interval of a daily rainfall total that exceeds a critical design threshold, such as 300 mm.
An initial, intuitive approach is to analyze the historical record.
Analysis of over 70 years of daily precipitation data from a station in Houston reveals 13 days where rainfall exceeded 150 mm. This allows for a simple empirical estimate of the event’s frequency. Thirteen exceedances in 71 years suggests an average recurrence interval of approximately 5.5 years. For the 300 mm threshold, however, the historical record contains zero events. An empirical estimate of the probability is therefore zero, which is uninformative for risk assessment and infrastructure design. We require a method for extrapolating beyond the range of observed data.
A natural extension would be to fit a standard parametric probability distribution—such as a Gamma distribution, which is commonly used for precipitation—to the entire set of daily rainfall observations. One could then use the extreme upper tail of the fitted distribution to estimate the probability of exceeding 300 mm.
This approach, however, is fundamentally unreliable. The goodness-of-fit of a parametric model is driven by its ability to describe the bulk of the data, where observations are plentiful. Minor model misspecifications in this high-density region can translate into substantial errors in the far tails of the distribution, where data are sparse or nonexistent. As stated by Coles (2001), the foundational text on this topic, “very small discrepancies in the estimate of \(F\) can lead to substantial discrepancies for \(F^n\)”, where in our example \(F\) is the cumulative distribution function (CDF) of daily rainfall while \(F^n\) is the CDF of the maximum daily rainfall over \(n\) days.
The models that best describe the central tendency of a process are not necessarily suitable for describing its extremes. This fragility of tail extrapolation necessitates a theoretical framework developed specifically for modeling the behavior of extreme values.
A naive empirical estimator for the exceedance probability of a value \(x\) based on \(n\) observations \(X_i\) is the simple proportion of exceedances:
\[ \hat{P}(X > x) = \frac{1}{n}\sum_{i=1}^{n} \mathbf{1}(X_i > x) \]
where \(\mathbf{1}(\cdot)\) is the indicator function. This is useful when the number of exceedances is large, but fails for extrapolation when the sum is zero, as was the case for the 300mm threshold. This mathematically demonstrates the limitations of simple empirical methods and motivates the specialized approaches that follow.
6.2 Approaches for modeling extreme values
Extreme Value Theory (EVT) provides such a framework. Rather than modeling the entire distribution of a process, EVT focuses on the distribution of its extreme values. There are two primary methods for extracting these values from a time series.
6.2.1 Block maxima
In this method, the period of observation is divided into non-overlapping blocks of equal duration (e.g., years). For each block of size \(n\), we define the block maximum as \(M_n = \max\{X_1, \dots, X_n\}\). This yields a new time series of block maxima. A crucial conceptual point is that we are not studying a single maximum possible value of the process. Instead, we are studying the distribution of the sample maximum, \(M_n\), which is itself a random variable. This is why a distribution like the GEV is needed to describe its behavior. This approach is intuitive and directly relates to concepts of annual risk, but it can be inefficient as it discards other potentially extreme events within a block.
6.2.2 Peaks-over-threshold
The peaks-over-threshold (POT) method defines a set of exceedances over a high threshold \(u\) as \(\{X_i : X_i > u\}\). The variable of interest is the value of the excesses themselves, \(Y = X_i - u\), for all \(X_i\) in the set of exceedances. The GPD is a model for these excess amounts, not the raw values. This approach is more data-efficient than the block maxima method. Its primary challenge lies in selecting an appropriate threshold, which involves a trade-off between bias (if the threshold is too low) and variance (if the threshold is too high, yielding too few data points). In practice, exceedances may occur in clusters (e.g., during a multi-day storm event). Therefore, a declustering algorithm is often applied to the raw exceedances to ensure that the events being modeled are approximately independent.
6.3 Asymptotic theory for extremes
The statistical justification for both the block maxima and POT approaches comes from asymptotic theorems that describe the limiting distributions of extreme values.
6.3.1 Theory for block maxima: The GEV distribution
The theoretical justification for the block maxima approach is the Extremal Types Theorem. This theorem states that for a large class of underlying distributions, if a stable, non-degenerate limiting distribution for the block maximum \(M_n\) exists, it must be the Generalized Extreme Value (GEV) distribution.
A critical point is that the theorem applies to a linearly rescaled or normalized maximum. Consider the raw maximum, \(M_n\). As the block size \(n\) increases, the expected value of \(M_n\) will also increase (or stay the same)—its distribution drifts towards larger values and does not converge. A similar issue arises in the Central Limit Theorem (CLT), which describes the convergence of a sample mean. The CLT does not apply to the raw sum of random variables, but to a normalized version of it.
The Extremal Types Theorem is the direct analog of the CLT for maxima. It states that there exist sequences of normalizing constants, a location parameter \(b_n\) and a scale parameter \(a_n > 0\), such that the distribution of the normalized maximum, \((M_n - b_n)/a_n\), converges to the GEV distribution as \(n \to \infty\).
In practice, we do not need to know the specific functional forms of \(a_n\) and \(b_n\). Instead, for a fixed, large block size \(n\) (e.g., one year of daily data), we fit the GEV distribution directly to the series of block maxima. The GEV’s location and scale parameters, \(\mu\) and \(\sigma\), effectively absorb and account for the normalization that the theory requires. The GEV is a flexible three-parameter family with a cumulative distribution function (CDF) given by:
\[ G(z) = \exp\left\{ -\left[ 1 + \xi \left( \frac{z - \mu}{\sigma} \right) \right]^{-1/\xi} \right\} \]
This is defined on the set \(\{z: 1 + \xi(z - \mu)/\sigma > 0\}\). The parameters are location (\(\mu\)), scale (\(\sigma > 0\)), and shape (\(\xi\)). These parameters implicitly depend on the block size \(n\).
6.3.2 Theory for threshold exceedances: The GPD
The corresponding theory for the POT approach is the Pickands–Balkema–de Haan Theorem. It states that for a wide range of distributions, the distribution of excesses over a high threshold \(u\) can be approximated by the Generalized Pareto Distribution (GPD). Conceptually, if the GEV describes the behavior of the maximum of a large block, the GPD describes the behavior of the distribution’s tail that produced that maximum. It is the distribution one would expect to see by “zooming in” on the tail of a distribution above a high threshold.
The GPD is a two-parameter family with a CDF for excesses \(Y = X - u\) given by:
\[ H(y) = 1 - \left( 1 + \frac{\xi y}{\sigma_u} \right)^{-1/\xi} \]
This is defined on \(\{y: y > 0 \text{ and } (1 + \xi y / \sigma_u) > 0\}\). The parameters are the shape, \(\xi\), and a scale parameter \(\sigma_u\) that depends on the threshold \(u\).
If the parent distribution’s maxima are GEV-distributed with parameters \((\mu, \sigma, \xi)\), then the GPD scale parameter is given by \(\sigma_u = \sigma + \xi(u - \mu)\). This key result shows that the GPD scale parameter is a linear function of the threshold \(u\), a property that is used in advanced diagnostics for threshold selection, and illustrates the links between the GPD and GEV models for extremes.
6.3.3 The shape parameter
The shape parameter, \(\xi\), is the most critical parameter in extreme value analysis and has the same interpretation in both the GEV and GPD. It governs the tail behavior of the distribution.
- \(\xi = 0\) (Gumbel-type tail): The tail decays exponentially (light tail).
- \(\xi > 0\) (Fréchet-type tail): The tail decays as a polynomial (heavy tail), with no upper bound.
- \(\xi < 0\) (Weibull-type tail): The distribution has a finite upper bound.
6.3.4 Connection between GEV and GPD
The GEV and GPD models are intrinsically linked. If the block maxima of a process follow a GEV distribution with parameters \((\mu, \sigma, \xi)\), then for a high threshold \(u\), the threshold excesses follow a GPD with the same shape parameter \(\xi\). This provides a theoretical consistency between the two primary modeling approaches.
6.4 Return periods and return levels
6.4.1 Definitions and Calculation
The output of an extreme value analysis is typically expressed in terms of return periods and return levels. The N-year return level, \(z_N\), is the level expected to be exceeded on average once every \(N\) years. It corresponds to the quantile of the distribution with an annual exceedance probability of \(p = 1/N\).
For the GEV distribution, it is calculated by inverting the CDF:
\[ z_N = \mu - \frac{\sigma}{\xi} \left[ 1 - (-\ln(1-p))^{-\xi} \right] \]
Calculating return levels from a GPD model requires an additional step. If \(\zeta_u\) is the rate of threshold exceedances (e.g., the average number of exceedances per year), the \(N\)-year return level is the value \(z_N\) that is exceeded with probability \(1/(N \zeta_u)\). It is calculated by adding the threshold back to the corresponding quantile of the GPD:
\[ z_N = u + \frac{\sigma_u}{\xi} \left[ (N \zeta_u)^\xi - 1 \right] \]
6.4.2 Return level plots
A standard diagnostic and visualization tool is the return level plot. This plot graphs the estimated return level \(z_N\) against the return period \(N\), with \(N\) typically plotted on a logarithmic scale. The curvature of the fitted line is a direct visualization of the shape parameter, \(\xi\): a straight line implies \(\xi=0\), a concave curve implies \(\xi>0\), and a convex curve implies \(\xi<0\).
[Placeholder for a Return Level Plot showing a fitted GEV or GPD curve with confidence intervals.]
6.5 Inference
6.5.1 Plotting Positions
To visually assess model fit, the fitted model is plotted against the observed maxima. For this purpose, we require a method to assign a non-exceedance probability (and thus a return period) to each observed maximum. For a sample of \(n\) block maxima, let \(z_{(1)} < z_{(2)} < \dots < z_{(n)}\) be the ordered values. We estimate the probability \(P(Z \le z_{(k)})\) using a plotting position formula. These formulas generally take the form:
\[ p_k = \frac{k - a}{n + 1 - 2a} \]
where \(k\) is the rank of the observation and \(a\) is a parameter. Common choices include:
- Weibull: \(a=0\), giving \(p_k = k/(n+1)\).
- Gringorten: \(a=0.44\), giving \(p_k = (k-0.44)/(n+0.12)\). This is often recommended as an unbiased choice for Gumbel-type distributions.
The empirical return period for the \(k\)-th observation is then estimated as \(1/(1-p_k)\).
6.5.2 Moments
6.5.3 MLE
6.5.4 Bayesian
6.6 Sampling variability
6.7 Regionalization
6.8 Nonstationarity
Further reading
- (Coles 2001): Canonical extreme value textbook with mathematical rigor and practical examples