# p are the model parameters, σ the standard deviation of the AR(1) errors, ρ is the autocorrelation coefficient, y is the data, m the model functionfunctionar_covariance_mat(σ, ρ, y_err) H =abs.((1:length(y_err)) .- (1:(length(y_err)))') # compute the outer product to get the correlation lags ζ_var = σ^2/ (1-ρ^2) Σ = ρ.^H * ζ_varfor i in1:length(y_err) Σ[i, i] += y_err[i]^2endreturn Σend
ar_covariance_mat (generic function with 1 method)
Maximize Likelihood
functionar_discrep_log_likelihood(p, σ, ρ, y, m, y_err) y_pred =m(p) residuals = y_pred .- y Σ =ar_covariance_mat(σ, ρ, y_err) ll =logpdf(MvNormal(zeros(length(y)), Σ), residuals)return llendebm_wrap(params) =ebm(forcing_non_aerosol_85[idx], forcing_aerosol_85[idx], p = params)# maximize log-likelihood within some range# important to make everything a Float instead of an Int lower = [1.0, 50.0, 0.0, 0.0, -1.0]upper = [4.0, 150.0, 2.0, 10.0, 1.0]p0 = [2.0, 100.0, 1.0, 1.0, 0.0]result = Optim.optimize(params ->-ar_discrep_log_likelihood(params[1:end-2], params[end-1], params[end], temp_obs, ebm_wrap, temp_sd), lower, upper, p0)θ_discrep = result.minimizer
The data-generating process here is straightforward: we can represent a coin flip with a heads-probability of \(\theta\) as a sample from a Bernoulli distribution,
Suppose that we spoke to a friend who knows something about coins, and she tells us that it is extremely difficult to make a passable weighted coin which comes up heads more than 75% of the time.
Coin Flipping Prior
Since \(\theta\) is bounded between 0 and 1, we’ll use a Beta distribution for our prior, specifically \(\text{Beta}(4,4)\).
Figure 4: Posterior distribution for the coin-flipping example
Bayes and Parametric Uncertainty
Frequentist: Parametric uncertainty is purely the result of sampling variability
Bayesian: Parameters have probabilities based on consistency with data and priors.
Bayesian Updating
The posterior is a “compromise” between the prior and the data.
The posterior mean is a weighted combination of the data and the prior mean.
The weights depend on the prior and the likelihood variances.
More data usually makes the posterior more confident.
Example: Local Sea Level Extremes
San Francisco Tide Gauge Data
Code
# read in data and get annual maximafunctionload_data(fname) date_format =DateFormat("yyyy-mm-dd HH:MM:SS")# This uses the DataFramesMeta.jl package, which makes it easy to string together commands to load and process data df =@chain fname begin CSV.read(DataFrame; header=false)rename("Column1"=>"year", "Column2"=>"month", "Column3"=>"day", "Column4"=>"hour", "Column5"=>"gauge")# need to reformat the decimal date in the data file@transform:datetime =DateTime.(:year, :month, :day, :hour)# replace -99999 with missing@transform:gauge =ifelse.(abs.(:gauge) .>=9999, missing, :gauge)select(:datetime, :gauge)endreturn dfenddat =load_data("data/surge/h551.csv")# detrend the data to remove the effects of sea-level rise and seasonal dynamicsma_length =366ma_offset =Int(floor(ma_length/2))moving_average(series,n) = [mean(@view series[i-n:i+n]) for i in n+1:length(series)-n]dat_ma =DataFrame(datetime=dat.datetime[ma_offset+1:end-ma_offset], residual=dat.gauge[ma_offset+1:end-ma_offset] .-moving_average(dat.gauge, ma_offset))# group data by year and compute the annual maximadat_ma =dropmissing(dat_ma) # drop missing datadat_annmax =combine(dat_ma -> dat_ma[argmax(dat_ma.residual), :], groupby(transform(dat_ma, :datetime =>x->year.(x)), :datetime_function))delete!(dat_annmax, nrow(dat_annmax)) # delete 2023; haven't seen much of that year yetrename!(dat_annmax, :datetime_function =>:Year)select!(dat_annmax, [:Year, :residual])dat_annmax.residual = dat_annmax.residual /1000# convert to m# make plotsp1 =plot( dat_annmax.Year, dat_annmax.residual; xlabel="Year", ylabel="Annual Max Tide Level (m)", label=false, marker=:circle, markersize=5, tickfontsize=16, guidefontsize=18)p2 =histogram( dat_annmax.residual, normalize=:pdf, orientation=:horizontal, label=:false, xlabel="PDF", ylabel="", yticks=[], tickfontsize=16, guidefontsize=18)l =@layout [a{0.7w} b{0.3w}]plot(p1, p2; layout=l, link=:y, ylims=(1, 1.7), bottom_margin=5mm, left_margin=5mm)plot!(size=(1000, 450))
Figure 5: Annual maxima surge data from the San Francisco, CA tide gauge.
Conjugate priors are often convenient, but may be poor choices.
We will talk about how to sample more generally with Monte Carlo later.
Key Points and Upcoming Schedule
Key Points: Discrepancy
Modeling discrepancy can separate system state-relevant errors from observation errors.
Further decomposition of data generating process.
When hindcasting, include observation errors; do not when projecting!
Key Points: Bayesian Statistics
Bayesian probability: parameters have probabilities conditional on data
Need to specify prior distribution (think generatively!).
Be transparent and principled about prior choices (sensitivity analyses?).
Maximum a posteriori gives “most probable” parameter values
Will talk more about general sampling later.
Next Class(es)
Monday: Extreme Value Theory and Models
Wednesday: No class! Develop project proposals.
Assessments
Friday: Exercise 1 due by 9pm.
HW2 available! Due 2/23 by 9pm.
References
References
Bayes, T. (1763). An Essay towards Solving a Problem in the Doctrine of Chance. Philosophical Transactions of the Royal Society of London, 53, 370–418.
Gelman, A., Simpson, D., & Betancourt, M. (2017). The prior can often only be understood in the context of the likelihood. Entropy, 19, 555. https://doi.org/10.3390/e19100555
Laplace, P. S. (1774). Mémoire sur la Probabilité des Causes par les évènemens. In Mémoires de Mathematique et de Physique, Presentés à l’Académie Royale des Sciences, Par Divers Savans & Lus Dans ses Assemblées (pp. 621–656).