Lecture 02
January 24, 2024



Text: VSRIKRISH to 22333
…A departure from the (unachievable) ideal of complete determinism…
— Walker et al. (2003)
| Uncertainty | Associated Uncertainties | Examples |
|---|---|---|
| Structural | Included physical processes, mathematical form | Model inadequacy, (epistemic) residual uncertainty |
| Parametric | Parameter uncertainty | Choice of parameters, strength of coupling between models |
| Sampling | Natural variability, (aleatoric) residual uncertainty | Internal variability, uncertain boundary conditions |
Probability distributions are often used to quantify uncertainty.
\[x \to \mathbb{P}_{\color{green}\nu}[x] = p_{\color{green}\nu}\left(x | {\color{purple}\theta}\right)\]
To write \(x\) is sampled from \(p(x|\theta)\): \[x \sim f(\theta)\]
For example, for a normal distribution: \[x \sim \mathcal{N}(\mu, \sigma)\]
A continuous distribution \(\mathcal{D}\) has a probability density function (PDF) \(f_\mathcal{D}(x) = p(x | \theta)\).
The probability of \(x\) occurring in an interval \((a, b)\) is \[\mathbb{P}[a \leq x \leq b] = \int_a^b f_\mathcal{D}(x)dx.\]
Important
The probability that \(x\) has a specific value \(x^*\), \(\mathbb{P}(x = x^*)\), is zero!
If \(\mathcal{D}\) is a distribution with PDF \(f_\mathcal{D}(x)\), the cumulative density function (CDF) of \(\mathcal{D}\) \(F_\mathcal{D}(x)\):
\[F_\mathcal{D}(x) = \int_{-\infty}^x f_\mathcal{D}(u)du.\]
If \(f_\mathcal{D}\) is continuous at \(x\): \[f_\mathcal{D}(x) = \frac{d}{dx}F_\mathcal{D}(x).\]
Discrete distributions have probability mass functions (PMFs) which are defined at point values, e.g. \(p(x = x^*) \neq 0\).
\[f_\mathcal{D}(x) = p(x | \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{1}{2}\left(\frac{x - \mu}{\sigma}^2\right)\right)\]
The sum or mean of a random sample is itself a random variable:
\[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \sim \mathcal{D}_n\]
\(\mathcal{D}_n\): The sampling distribution of the mean (or sum, or other estimate of interest).
If
\[\begin{align*} &\bbox[yellow, 10px, border:5px solid red] {\lim_{n \to \infty} \sqrt{n}(\bar{X}_n - \mu ) = \mathcal{N}(0, \sigma^2)} \\ \Rightarrow &\bbox[yellow, 10px, border:5px solid red] {\bar{X}_n \overset{\text{approx}}{\sim} \mathcal{N}(\mu, \sigma^2/n)} \end{align*}\]
For a large enough set of samples, the sampling distribution of a sum or mean of random variables is approximately a normal distribution, even if the random variables themselves are not.
Can we think about when this might break down?
How we communicate/capture uncertainty depends on how we interpret probability:
Frequentist:
Bayesian:
Frequentist estimates have confidence intervals, which will contain the “true” parameter value for \(\alpha\)% of data samples.
No guarantee that an individual CI contains the true value (with any “probability”)!
# set up distribution
mean_true = 0.4
n_cis = 100 # number of CIs to compute
dist = Normal(mean_true, 2)
# use sample size of 100
samples = rand(dist, (100, n_cis))
# mapslices broadcasts over a matrix dimension, could also use a loop
sample_means = mapslices(mean, samples; dims=1)
sample_sd = mapslices(std, samples; dims=1)
mc_sd = 1.96 * sample_sd / sqrt(100)
mc_ci = zeros(n_cis, 2) # preallocate
for i = 1:n_cis
mc_ci[i, 1] = sample_means[i] - mc_sd[i]
mc_ci[i, 2] = sample_means[i] + mc_sd[i]
end
# find which CIs contain the true value
ci_true = (mc_ci[:, 1] .< mean_true) .&& (mc_ci[:, 2] .> mean_true)
# compute percentage of CIs which contain the true value
ci_frac1 = 100 * sum(ci_true) ./ n_cis
# plot CIs
p1 = plot([mc_ci[1, :]], [1, 1], linewidth=3, color=:blue, label="95% Confidence Interval", title="Sample Size 100", yticks=:false, tickfontsize=14, titlefontsize=20, legend=:false, guidefontsize=16)
for i = 2:n_cis
if ci_true[i]
plot!(p1, [mc_ci[i, :]], [i, i], linewidth=2, color=:blue, label=:false)
else
plot!(p1, [mc_ci[i, :]], [i, i], linewidth=2, color=:red, label=:false)
end
end
vline!(p1, [mean_true], color=:black, linewidth=2, linestyle=:dash, label="True Value") # plot true value as a vertical line
xaxis!(p1, "Estimate")
plot!(p1, size=(500, 400)) # resize to fit slide
# use sample size of 1000
samples = rand(dist, (1000, n_cis))
# mapslices broadcasts over a matrix dimension, could also use a loop
sample_means = mapslices(mean, samples; dims=1)
sample_sd = mapslices(std, samples; dims=1)
mc_sd = 1.96 * sample_sd / sqrt(1000)
mc_ci = zeros(n_cis, 2) # preallocate
for i = 1:n_cis
mc_ci[i, 1] = sample_means[i] - mc_sd[i]
mc_ci[i, 2] = sample_means[i] + mc_sd[i]
end
# find which CIs contain the true value
ci_true = (mc_ci[:, 1] .< mean_true) .&& (mc_ci[:, 2] .> mean_true)
# compute percentage of CIs which contain the true value
ci_frac2 = 100 * sum(ci_true) ./ n_cis
# plot CIs
p2 = plot([mc_ci[1, :]], [1, 1], linewidth=3, color=:blue, label="95% Confidence Interval", title="Sample Size 1,000", yticks=:false, tickfontsize=14, titlefontsize=20, legend=:false, guidefontsize=16)
for i = 2:n_cis
if ci_true[i]
plot!(p2, [mc_ci[i, :]], [i, i], linewidth=2, color=:blue, label=:false)
else
plot!(p2, [mc_ci[i, :]], [i, i], linewidth=2, color=:red, label=:false)
end
end
vline!(p2, [mean_true], color=:black, linewidth=2, linestyle=:dash, label="True Value") # plot true value as a vertical line
xaxis!(p2, "Estimate")
plot!(p2, size=(500, 400)) # resize to fit slide
display(p1)
display(p2)90% of the CIs contain the true value (left) vs. 94% (right)
Correlation refers to whether two variables increase or decrease simultaneously.
Typically measured with Pearson’s coefficient:
\[r = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} \in (-1, 1)\]
# sample 1000 independent variables from a given normal distribution
sample_independent = rand(Normal(0, 1), (2, 1000))
p1 = scatter(sample_independent[1, :], sample_independent[2, :], label=:false, title="Independent Variables", tickfontsize=14, titlefontsize=18, guidefontsize=18)
xlabel!(p1, L"$x_1$")
ylabel!(p1, L"$x_2$")
plot!(p1, size=(400, 500))
# sample 1000 correlated variables, with r=0.7
sample_correlated = rand(MvNormal([0; 0], [1 0.7; 0.7 1]), 1000)
p2 = scatter(sample_correlated[1, :], sample_correlated[2, :], label=:false, title=L"Correlated ($r=0.7$)", tickfontsize=14, titlefontsize=18, guidefontsize=18)
xlabel!(p2, L"$x_1$")
ylabel!(p2, L"$x_2$")
plot!(p2, size=(400, 500))
# sample 1000 anti-correlated variables, with r=-0.7
sample_anticorrelated = rand(MvNormal([0; 0], [1 -0.7; -0.7 1]), 1000)
p3 = scatter(sample_anticorrelated[1, :], sample_anticorrelated[2, :], label=:false, title=L"Anticorrelated ($r=-0.7$)", tickfontsize=14, titlefontsize=18, guidefontsize=18)
xlabel!(p3, L"$x_1$")
ylabel!(p3, L"$x_2$")
plot!(p3, size=(400, 500))
display(p1)
display(p2)
display(p3)

Source: Errickson et al. (2021)
Time series can also be auto-correlated, called an autoregressive model:
\[y_t = \sum_{i=1}^{t-1} \rho_i y_{t-i} + \varepsilon_t\]
Example: A time series is autocorrelated with lag 1 (called an AR(1) model) if \(y_t = \rho y_{t-1} + \varepsilon_t\).