Bayesian Statistics and Probability Models


Lecture 04

January 31, 2024

Last Class(es)

Probability Models

Goal: Write down a probability model for the data-generating process for \(\mathbf{y}\).

  • Direct statistical model, \[\mathbf{y} \sim \mathcal{D}(\theta).\]
  • Model for the residuals of a numerical model, \[\mathbf{r} = \mathbf{y} - F(\mathbf{x}) \sim \mathcal{D}(\theta).\]

Model Fitting as Maximum Likelihood Estimation

We can interpret fitting a model (reducing error according to some loss or error metric) as maximizing the probability of observing our data from this data generating process.

Accounting for Model Residuals

The Energy Balance Model Revisited

Last class, we introduced the Energy Balance Model (EBM):

\[T_{i+1} = T_i + \frac{F_i - \lambda T_i}{cd} \Delta t\]

Let’s write this as \(\mathbf{T} = \Lambda(\mathbf{F})\).

EBM Probability Model

Assume independent and identically distributed (i.i.d.) normal residuals:

\[\mathbf{y_i} \sim \mathcal{N}(\Lambda(\mathbf{F_i}), \sigma).\]

i.i.d. Log-Likelihood

\[ \log \mathcal{L}(\theta | \mathbf{y}, F) = \sum_{i=1}^n \left[\log \frac{1}{\sqrt{2\pi}} - \frac{1}{2\sigma^2}(y_i - F(x_i)) ^2 \right] \]

If we ignore all of the constants, this is proportional to the negative mean-squared error.

Maximum Likelihood Estimate

# p are the model parameters, σ the standard deviation of the normal errors, y is the data, m the model function
function log_likelihood(p, σ, y, m)
    y_pred = m(p)
    ll = sum(logpdf.(Normal.(y_pred, σ), y))
end

ebm_wrap(params) = ebm(forcing_non_aerosol_85[idx], forcing_aerosol_85[idx], p = params)

# maximize log-likelihood within some range
# important to make everything a Float instead of an Int 
lower = [1.0, 50.0, 0.0, 0.0]
upper = [4.0, 150.0, 2.0, 10.0]
p0 = [2.0, 100.0, 1.0, 1.0]
result = Optim.optimize(params -> -log_likelihood(params[1:end-1], params[end], temp_obs, ebm_wrap), lower, upper, p0)
θ = result.minimizer

Maximum Likelihood Estimate

4-element Vector{Float64}:
  1.943746862749113
 86.39077197089858
  0.789333067313891
  0.12104289634011653

Generating Simulations

# set number of sampled simulations
n_samples = 1000
residuals = rand(Normal(0, θ[end]), (n_samples, length(temp_obs)))
model_out = ebm_wrap(θ[1:end-1])
# this uses broadcasting to "sweep" the model simulation across the sampled residual matrix
model_sim = residuals .+ model_out' # need to transpose the model output vector due to how Julia treats vector dimensions

“Best Fit”

Code
plot(hind_years, model_out, color=:red, linewidth=3, label="Model Simulation", guidefontsize=18, tickfontsize=16, legendfontsize=16, xlabel="Year", ylabel="Temperature anomaly (°C)", bottom_margin=5mm, left_margin=5mm)
ylims!(-0.5, 1.2)
scatter!(hind_years, temp_obs, color=:black, label="Data")
Figure 1: Best fit for the EBM with normal residuals.

Adding In Uncertain Residuals…

Code
plot(hind_years, model_out, color=:red, linewidth=3, label="Model Simulation", xlabel="Year", ylabel="Temperature anomaly (°C)", guidefontsize=18, tickfontsize=16, legendfontsize=16, bottom_margin=5mm, left_margin=5mm)
plot!(hind_years, model_sim[1, :], color=:grey, linewidth=1, label="Model Simulation With Noise", alpha=0.5)
plot!(hind_years, model_sim[2:10, :]', color=:grey, linewidth=1, label=false, alpha=0.5)
scatter!(hind_years, temp_obs, color=:black, label="Data")
ylims!(-0.5, 1.2)
Figure 2: Comparison of best fit with uncertain realization for the EBM with normal residuals.

Visualizing the Uncertainty Spread

Code
q_90 = mapslices(col -> quantile(col, [0.05, 0.95]), model_sim,; dims=1) # compute 90% prediction interval

plot(hind_years, model_out, color=:red, linewidth=3, label="Model Simulation", ribbon=(model_out .- q_90[1, :], q_90[2, :] .- model_out), fillalpha=0.3, xlabel="Year", ylabel="Temperature anomaly (°C)", guidefontsize=18, tickfontsize=16, legendfontsize=16, bottom_margin=5mm, left_margin=5mm)
scatter!(hind_years, temp_obs, color=:black, label="Data")
ylims!(-0.5, 1.2)
Figure 3: Comparison of best fit with uncertain realization for the EBM with normal residuals.

Checking for Calibration

For an \(\alpha\)% projection interval \(\mathcal{I}_\alpha\), we would expect ~\(\alpha\)% of the data to be contained in this interval.

This rate is quantified by the surprise index: \(1 - \frac{1}{n} \sum_{i=1}^n \mathbb{I}_{\mathcal{I}_\alpha}(y_i).\)

Surprise Index

Code
surprises = 0 # initialize surprise counter
# go through the data and check which points are outside of the 90% interval
for i = 1:length(temp_obs)
    ## The || operator is an OR, so returns true if either of the terms are true
    if (temp_obs[i] < q_90[1, i]) || (q_90[2, i] < temp_obs[i])
        surprises += 1
    end
end
surprises / length(temp_obs)
0.11695906432748537

We used a 90% projection interval:

  • How does this compare?

Projections (Best Fit)

Code
model_sim_85 = ebm(forcing_non_aerosol_85[sim_idx], forcing_aerosol_85[sim_idx], p=θ[1:end-1])
model_sim_26 = ebm(forcing_non_aerosol_26[sim_idx], forcing_aerosol_26[sim_idx], p=θ[1:end-1])


plot(sim_years, model_sim_26, color=:blue, linewidth=3, label="RCP 2.6 Projection", fillalpha=0.3, xlabel="Year", ylabel="Temperature anomaly (°C)", guidefontsize=18, tickfontsize=16, legendfontsize=16, bottom_margin=5mm, left_margin=5mm)
plot!(sim_years, model_sim_85, color=:red, linewidth=3, label="RCP 8.5 Projection", fillalpha=0.3)
scatter!(hind_years, temp_obs, color=:black, label="Data")
ylims!(-0.5, 5)
Figure 4: Best fit for different SSPs

Projections (Residual Uncertainty)

Code
resids = rand(Normal(0, θ[end]), (n_samples, length(sim_years)))
q90_26 = mapslices(col -> quantile(col, [0.05, 0.95]), resids .+ model_sim_26'; dims=1) # compute 90% prediction interval
q90_85 = mapslices(col -> quantile(col, [0.05, 0.95]), resids .+ model_sim_85'; dims=1) # compute 90% prediction interval

plot(sim_years, model_sim_26, color=:blue, linewidth=3, label="RCP 2.6 Projection", fillalpha=0.3, ribbon=(model_sim_26 .- q90_26[1, :], q90_26[2, :] .- model_sim_26), xlabel="Year", ylabel="Temperature anomaly (°C)", guidefontsize=18, tickfontsize=16, legendfontsize=16, bottom_margin=5mm, left_margin=5mm)
plot!(sim_years, model_sim_85, color=:red, linewidth=3, ribbon=(model_sim_85 .- q90_85[1, :], q90_85[2, :] .- model_sim_85), label="RCP 8.5 Projection", fillalpha=0.3)
scatter!(hind_years, temp_obs, color=:black, label="Data")
ylims!(-0.5, 5)
Figure 5: Best fit for different SSPs

More General Probability Models

Checking The Model Residuals

One mantra in this class: check your residuals!

We assumed our models were independently and identically distributed according to a normal distribution.

How can we check these assumptions?

Diagnostics: Normality

Code
residuals = model_out .- temp_obs # calculate residuals
p1 = histogram(residuals, tickfontsize=16, guidefontsize=18, legend=false, xlabel="Residual (°C)", ylabel="Count") # plot histogram to check distributional assumption
p2 = qqplot(Normal, residuals, tickfontsize=16, guidefontsize=18, xlabel="Normal Theoretical Quantile", ylabel="Sample Residual Quantile")

plot!(p1, size=(500, 500))
plot!(p2, size=(500, 500))
display(p1)
display(p2)
(a) Distribution
(b) Q-Q Plot
Figure 6: Checking residual assumptions of normality

Diagnostics: Independence

Code
p1 = plot(pacf(residuals, 1:5), marker=:circle, line=:stem, linewidth=3, markersize=8, tickfontsize=16, guidefontsize=18, legend=false, ylabel="Partial Autocorrelation", xlabel="Time Lag")
p2 = scatter(residuals[1:end-1], residuals[2:end], legend=false, xlabel=L"Residual $t_i$ (°C)", ylabel=L"Residual $t_{i+1}$ (°C)", guidefontsize=18, tickfontsize=16)  
dat = DataFrame(X=residuals[1:end-1], Y=residuals[2:end])
fit = lm(@formula(Y~X), dat)
pred = predict(fit,dat)
plot!(p2, dat.X, pred, linewidth=2)

plot!(p1, size=(500, 500))
plot!(p2, size=(500, 500))
display(p1)
display(p2)
(a) Partial autocorrelation
(b) Scatterplot of lag-1 residuals
Figure 7: Checking residual assumptions for independence and autocorrelation

Diagnostics: Constant Variance

Code
p1 = scatter(residuals,  tickfontsize=16, guidefontsize=18, legend=false, ylabel="Residual (°C)", xlabel="Time")
dat = DataFrame(Y=residuals, X=1:length(residuals))
fit = lm(@formula(Y~X), dat)
pred = predict(fit,dat)
plot!(p1, dat.X, pred, linewidth=2)
p2 = scatter(model_out, residuals, legend=false, xlabel="Modeled Temperature (°C)", ylabel="Residual (°C)", guidefontsize=18, tickfontsize=16)  
dat = DataFrame(Y=residuals, X=model_out)
fit = lm(@formula(Y~X), dat)
pred = predict(fit,dat)
plot!(p2, dat.X, pred, linewidth=2)

plot!(p1, size=(500, 500))
plot!(p2, size=(500, 500))
display(p1)
display(p2)
(a) Time dependence
(b) Temperature depednence
Figure 8: Checking residual assumptions for constant variance

A Model for Residual Autocorrelation

An autoregressive with lag 1 model (AR(1)):

\[ \begin{gather*} r_{i+1} = \rho r_i + \varepsilon_i \\ \varepsilon_i \sim \mathcal{N}(0, \sigma) \end{gather*} \]

Rearranging for Likelihood

\[\begin{gather*} r_{i+1} \sim \mathcal{N}(\rho r_i, \sigma) \\ r_1 \sim \mathcal{N}\left(0, \frac{\sigma}{\sqrt{1-\rho^2}}\right) \end{gather*} \]

Fitting AR(1) Model

# p are the model parameters, σ the standard deviation of the AR(1) errors, ρ is the autocorrelation coefficient, y is the data, m the model function
function ar_log_likelihood(p, σ, ρ, y, m)
    y_pred = m(p)
    ll = 0 # initialize log-likelihood counter
    residuals = y_pred .- y
    ll += logpdf(Normal(0, σ/sqrt(1-ρ^2)), residuals[1])
    for i = 1:length(residuals)-1
        residuals_whitened = residuals[i+1] - ρ * residuals[i]
        ll += logpdf(Normal(0, σ), residuals_whitened)
    end
    return ll
end

ebm_wrap(params) = ebm(forcing_non_aerosol_85[idx], forcing_aerosol_85[idx], p = params)

# maximize log-likelihood within some range
# important to make everything a Float instead of an Int 
lower = [1.0, 50.0, 0.0, 0.0, -1.0]
upper = [4.0, 150.0, 2.0, 10.0, 1.0]
p0 = [2.0, 100.0, 1.0, 1.0, 0.0]
result = Optim.optimize(params -> -ar_log_likelihood(params[1:end-2], params[end-1], params[end], temp_obs, ebm_wrap), lower, upper, p0)
θ_ar1 = result.minimizer

Comparison of IID and AR(1) Fits

Normal IID:

4-element Vector{Float64}:
  1.943746862749113
 86.39077197089858
  0.789333067313891
  0.12104289634011653

AR(1):

5-element Vector{Float64}:
   2.0068089111198297
 109.38026355047445
   0.7915448981319909
   0.10196962494870278
   0.5421594991244063

Comparison of Model Simulation

Code
# iid simulations
residuals_iid = rand(Normal(0, θ[end]), (n_samples, length(hind_years)))
model_iid = ebm_wrap(θ[1:end-1])
model_sim_iid = residuals_iid .+ model_iid'

# ar1 simulations
residuals_ar1 = zeros(n_samples, length(hind_years))
residuals_ar1[:, 1] = rand(Normal(0, θ_ar1[end-1] / sqrt(1-θ_ar1[end]^2)), n_samples)
for i = 2:size(residuals_ar1)[2]
    residuals_ar1[:, i] .= rand.(Normal.(θ_ar1[end] * residuals_ar1[:, i-1], θ_ar1[end-1]))
end
model_ar1 = ebm_wrap(θ_ar1[1:end-2])
model_sim_ar1 = residuals_ar1 .+ model_ar1'

q90_iid = mapslices(col -> quantile(col, [0.05, 0.95]), model_sim_iid; dims=1) # compute 90% prediction interval
q90_ar1 = mapslices(col -> quantile(col, [0.05, 0.95]), model_sim_ar1; dims=1) # compute 90% prediction interval

plot(hind_years, model_iid, color=:red, linewidth=3, label="IID Model Simulation", ribbon=(model_iid .- q90_iid[1, :], q90_iid[2, :] .- model_iid), fillalpha=0.3, xlabel="Year", ylabel="Temperature anomaly (°C)", guidefontsize=18, tickfontsize=16, legendfontsize=16, bottom_margin=5mm, left_margin=5mm)
plot!(hind_years, model_ar1, color=:blue, linewidth=3, label="AR(1) Model Simulation", ribbon=(model_ar1 .- q90_ar1[1, :], q90_ar1[2, :] .- model_out), fillalpha=0.3)
scatter!(hind_years, temp_obs, color=:black, label="Data")
Figure 9: Comparison of best fit with uncertain realization for the EBM with normal residuals.

AR(1) Surprise Index

Code
surprises = 0 # initialize surprise counter
# go through the data and check which points are outside of the 90% interval
for i = 1:length(temp_obs)
    ## The || operator is an OR, so returns true if either of the terms are true
    if (temp_obs[i] < q90_ar1[1, i]) || (q90_ar1[2, i] < temp_obs[i])
        surprises += 1
    end
end
surprises / length(temp_obs)
0.14035087719298245

Impacts on Projections (RCP 8.5)

Code
# iid simulations
residuals_iid = rand(Normal(0, θ[end]), (n_samples, length(sim_years)))
model_iid = ebm(forcing_non_aerosol_85[sim_idx], forcing_aerosol_85[sim_idx], p=θ[1:end-1])
model_sim_iid = residuals_iid .+ model_iid'

# ar1 simulations
residuals_ar1 = zeros(n_samples, length(sim_years))
residuals_ar1[:, 1] = rand(Normal(0, θ_ar1[end-1] / sqrt(1-θ_ar1[end]^2)), n_samples)
for i = 2:size(residuals_ar1)[2]
    residuals_ar1[:, i] .= rand.(Normal.(θ_ar1[end] * residuals_ar1[:, i-1], θ_ar1[end-1]))
end
model_ar1 = ebm(forcing_non_aerosol_85[sim_idx], forcing_aerosol_85[sim_idx], p=θ_ar1[1:end-2])
model_sim_ar1 = residuals_ar1 .+ model_ar1'

q90_iid = mapslices(col -> quantile(col, [0.05, 0.95]), model_sim_iid; dims=1) # compute 90% prediction interval
q90_ar1 = mapslices(col -> quantile(col, [0.05, 0.95]), model_sim_ar1; dims=1) # compute 90% prediction interval

plot(sim_years, model_iid, color=:red, linewidth=3, label="IID Model Simulation", ribbon=(model_iid .- q90_iid[1, :], q90_iid[2, :] .- model_iid), fillalpha=0.3, xlabel="Year", ylabel="Temperature anomaly (°C)", guidefontsize=18, tickfontsize=16, legendfontsize=16, bottom_margin=5mm, left_margin=5mm, right_margin=8mm)
plot!(sim_years, model_ar1, color=:blue, linewidth=3, label="AR(1) Model Simulation", ribbon=(model_ar1 .- q90_ar1[1, :], q90_ar1[2, :] .- model_ar1), fillalpha=0.3)
xlims!(2000, 2100)
ylims!(0, 5)
Figure 10: Comparisons of different residual structures for RCP 8.5 projections.

Ok, But What About Parameter Uncertainty?

The frameworks we’ve been using so far: no parametric uncertainty and minimal prior information.

This can sometimes lead to odd outcomes, e.g. low climate sensitivities.

In a few weeks, we will look at approaches to incorporate parametric uncertainty and prior information.

Key Points

Key Points

  • Probability models for data let us generate alternate datasets/realizations.
  • Different probability models may have more or less impact on simulated outcomes but may result in different parameter values.

Upcoming Schedule

Next Class(es)

Next Week: Exploratory data analysis

Assessments

Friday: HW1 and Exercise 1 due by 9pm.