generating proposals for new samples from a conditional proposal distribution\(q(y | x)\);
accepting or rejecting those proposals.
The Metropolis-Hastings Algorithm
Given \(X_t = x_t\):
Generate \(Y_t \sim q(y | x_t)\);
Set \(X_{t+1} = Y_t\) with probability \(\rho(x_t, Y_t)\), where \[\rho(x, y) = \min \left\{\frac{\pi(y)}{\pi(x)}\frac{q(x | y)}{q(y | x)}, 1\right\},\] else set \(X_{t+1} = x_t\).
How Simple Is That?
The devil is in the details: performance and efficiency are highly dependent on the choice of \(q\).
Key: There is a tradeoff between exploration and acceptance.
Wide proposal: Can make bigger jumps, may be more likely to reject proposals.
Narrow proposal: More likely to accept proposals, may not “mix” efficiently.
Proposal Distribution Choice
The original Metropolis et al. (1953) algorithm used symmetric distributions (\(q(y | x) = q(x | y)\)).
Then the acceptance probability reduces to \[\rho = \min \left\{\frac{\pi(y)}{\pi(x)}, 1\right\}.\]
A common choice: \(y \sim \text{Normal}(X_t, \sigma)\) centered around the current point \(X_t\).
Julia Implementation
functionmh_transition(x_current, σ)# generate new proposal x_proposal =rand(Normal(x_current, σ)) u =rand() ρ =log(target_density(x_proposal)) -log(target_density(x_current)) # transition log-probabilityiflog(u) <min(ρ, 1) y = x_proposalelse y = x_currentendreturn y, log(target_density(y))end
\[
\begin{gather}
y = a + bx + \varepsilon \\
\varepsilon \sim \text{Normal}(0, \sigma).
\end{gather}
\]
This makes the likelihood: \[
y \sim \text{Normal}(a+bx, \sigma).
\]
Prior Selection
\[
\begin{gather}
a \sim \text{Normal(0, 2)} \\
b \sim \text{Normal(0, 2)} \\
\sigma \sim \text{Half-Normal}(0, 1)
\end{gather}
\]
Code
# generate data for samplesfunctiongen_data(a, b, σ) x =collect(0:20) y = a .+ b * x# sample and add noise ε =rand(Normal(0, σ), length(x)) y .+= εreturn yend# sample and plotn_samples =1000a =rand(Normal(0, 2), n_samples)b =rand(Normal(0, 2), n_samples)σ =rand(truncated(Normal(0, 1), lower=0), 1000)y_prior = [gen_data(a[i], b[i], σ[i]) for i in1:n_samples]# convert y to a Matrix by vcatting each vectory_prior =mapreduce(permutedims, vcat, y_prior) plt_prior_1 =plot(; ylabel=L"$y$", xlabel=L"$x$", tickfontsize=16, legendfontsize=16, guidefontsize=18, bottom_margin=5mm, left_margin=5mm, legend=false)for x ∈ [0, 5, 10, 15, 20]boxplot!(plt_prior_1, [x], y_prior[:, x+1], color=:blue)endplot!(plt_prior_1, size=(600, 550))plt_prior_1
Figure 2: Prior predictive plot for regression example.
Proposal Distribution and Initial Value
To illustrate how the M-H algorithm works, let’s use a proposal \[\mathcal{N}(x_t, 0.01I_3).\]
And let’s start at \[x_0 = \begin{pmatrix}1 \\ 1 \\ 1\end{pmatrix}\].
(a) Example target density for Metropolis-Hastings
(b)
Figure 7
Sampling Efficiency
Two common measures of sampling efficiency:
Acceptance Rate: Rate at which proposals are accepted
“Optimally” 30-45% (depending on number of parameters)
Effective Sample Size (ESS): Accounts for autocorrelation \(\rho_t\) across samples \[N_\text{eff} = \frac{N}{1+2\sum_{t=1}^\infty \rho_t}\]
Sampling Efficiency Example
MCMC Sampling for Various Proposals
Autocorrelation of Chains
MCMC Sampling for Various Proposals
ESS by Proposal Variance for Example
MCMC Sampling for Various Proposals
Key Points, Upcoming Schedule, and References
Key Points (Metropolis-Hastings)
Construct ergodic and reversible Markov chains with posterior as stationary distribution.
Metropolis-Hastings: conceptually simple algorithm, but implementation plays a major role.
Proposal distribution plays a large role in acceptance rate and effective sample size.
Next Classes
Wednesday: MCMC Examples
Monday: MCMC Lab (No exercises these weeks)
Next Wednesday: Literature Presentations (email slides by 9pm Tuesday night).
Assessments
Homework 3: Due 3/22
References
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of State Calculations by Fast Computing Machines. J. Chem. Phys., 21, 1087–1092. https://doi.org/10.1063/1.1699114