Inference Example: Flipping a Coin ✏️
This notebook demonstrates the fundamental principles of statistical inference using the pedagogically clean example of coin flipping. While simple, this example illuminates the core concepts that underlie all statistical analysis: likelihood functions, maximum likelihood estimation, and Bayesian inference.
The same principles apply whether we’re estimating the probability of a fair coin or the frequency of extreme climate events. Understanding these methods enables principled learning from data under uncertainty.
Problem
Consider flipping a coin of unknown success rate \(\theta\) (the probability of heads) multiple times. After observing the outcomes, we want to estimate \(\theta\). This mirrors many climate problems where we observe limited data and seek to understand underlying probabilities.
Generating example data
We start with a known scenario to validate our inference methods.
# Set parameters for reproducible example
Random.seed!(43)
true_θ = 0.55 # Unknown to our "inference" methods
n_flips = 12
# Simulate the experiment (for illustration)
coin_flips = rand(n_flips) .< true_θ
observed_heads = sum(coin_flips)
# display as H or T
flip_results = [flip ? "H" : "T" for flip in coin_flips]
This setup allows us to evaluate how well different inference methods recover the true parameter value. In real applications, we never know the true value and must rely on the methods demonstrated here.
Likelihood-based inference
The likelihood function bridges observed data and unknown parameters. It quantifies how probable our observed data would be under different parameter values.
Understanding the likelihood function
For coin flips, the likelihood follows the binomial distribution: \[p(\text{data} \mid \theta) = \binom{n}{k} \theta^k (1-\theta)^{n-k}\]
where \(k\) is the number of observed heads and \(n\) is the total number of flips.
function plot_likelihood(y, n)
θ_grid = 0.01:0.01:0.99
# Calculate likelihood for each θ value
likelihood_vals = [pdf(Binomial(n, θ), y) for θ in θ_grid]
fig = Figure(size = (700, 500))
ax = Axis(fig[1, 1],
xlabel = L"$\theta$ \text{ (probability of heads)}",
ylabel = "Likelihood",
title = "Likelihood function for $(y) heads in $(n) flips")
lines!(ax, θ_grid, likelihood_vals,
label = L"p(\vec{y} \mid \theta)", linewidth = 2, color = :blue)
# Mark the MLE
mle = y / n
vlines!(ax, [mle], color = :red, linestyle = :dash, linewidth = 2,
label = L"$\hat{\theta}_\text{MLE} = %$(round(mle, digits=2))$")
# Mark the true value for comparison
vlines!(ax, [true_θ], color = :green, linestyle = :dot, linewidth = 2,
label = L"$\theta_\text{true}$ = %$(true_θ)")
axislegend(ax, position = :lt)
return fig
end
fig_likelihood = plot_likelihood(observed_heads, n_flips)
fig_likelihood
The likelihood function shows how plausible different parameter values are given our observed data. It peaks at the observed proportion, which becomes our maximum likelihood estimate. Notice how the curve’s width reflects uncertainty—more data would create a sharper peak.
Maximum likelihood estimation
Maximum likelihood estimation (MLE) finds the parameter value that makes the observed data most probable. For the binomial model, this has a simple analytical solution.
function analyze_mle_behavior()
n = 10
possible_heads = [3, 5, 7, 9]
fig = Figure(size = (800, 600))
θ_grid = 0.01:0.01:0.99
for (i, y) in enumerate(possible_heads)
row = div(i - 1, 2) + 1
col = ((i - 1) % 2) + 1
ax = Axis(fig[row, col],
xlabel = L"$\theta$",
ylabel = "likelihood",
title = "$(y)/$(n) heads")
likelihood = pdf.(Binomial.(n, θ_grid), y)
lines!(ax, θ_grid, likelihood, linewidth = 2, color = :blue, label = L"$p(\vec{y} \mid \theta)$")
mle = y / n
mle = round(mle, digits = 2)
vlines!(ax, [mle], color = :red, linestyle = :dash, linewidth = 2, label = L"$\hat{\theta}_{MLE}$")
if (row == 1) && (col == 1)
axislegend(ax, position = :rt)
end
end
return fig
end
fig_mle = analyze_mle_behavior()
fig_mle
Each panel shows how the likelihood function changes with different observed outcomes. The maximum always occurs at the observed proportion, confirming that MLE equals the sample proportion for binomial data.
This relationship holds broadly: MLE provides intuitive estimates that often match simple summary statistics for well-behaved problems.
Bayesian inference
First we define our probabilistic model in Turing.
@model function coin_flip_model(y, n, prior_α, prior_β)
θ ~ Beta(prior_α, prior_β)
y ~ Binomial(n, θ)
return θ
end
coin_flip_model (generic function with 2 methods)
Then, we compare the analytical posterior (which is available due to conjugacy) to the posterior obtained via MCMC sampling.
function compare_analytical_mcmc(y, n; prior_α = 3, prior_β = 3, n_samples = 5000, n_threads = 4)
# Analytical posterior
post_α = prior_α + y
post_β = prior_β + n - y
analytical_posterior = Beta(post_α, post_β)
# MCMC sampling
model = coin_flip_model(y, n, prior_α, prior_β)
chain = sample(
model, # the object with model + data
NUTS(), # our sampler
MCMCThreads(), # to use multiple threads
Int(ceil(n_samples / n_threads)), # number of draws total
n_threads, # number of chains
discard_initial = 1000, # burn-in samples per chain (discarded)
progress = false, # disable progress meter
)
mcmc_samples = vec(chain[:θ]) # convert to Vector (not always helpful)
# Return data for plotting instead of creating plots
return (analytical_posterior = analytical_posterior, mcmc_samples = mcmc_samples, n_samples = n_samples)
end
# Generate data by calling the same function twice with different sample sizes
n_low = 500
n_high = 50_000
data_low = compare_analytical_mcmc(7, 10, n_samples = n_low)
data_high = compare_analytical_mcmc(7, 10, n_samples = n_high)
We can then compare how well we can sample the analytical distribution with different numbers of samples.
# Combine plots
fig_comparison = Figure(size = (1200, 500))
ax1 = Axis(fig_comparison[1, 1],
xlabel = L"$\theta$ (probability of heads)",
ylabel = "Density",
title = "MCMC with $n_low samples")
ax2 = Axis(fig_comparison[1, 2],
xlabel = L"$\theta$ (probability of heads)",
title = "MCMC with $n_high samples")
# Link y-axes for comparison
linkaxes!(ax1, ax2)
# Plot using the same plotting logic for both panels
for (ax, data) in [(ax1, data_low), (ax2, data_high)]
# Plot analytical posterior
θ_grid = 0.01:0.01:0.99
analytical_density = pdf.(data.analytical_posterior, θ_grid)
lines!(ax, θ_grid, analytical_density, label = "Analytical",
linewidth = 3, color = :blue)
# Plot MCMC histogram
hist!(ax, vec(data.mcmc_samples), bins = 40, normalization = :pdf,
color = (:red, 0.6), label = "MCMC samples")
# Add summary statistics
analytical_mean = mean(data.analytical_posterior)
mcmc_mean = mean(data.mcmc_samples)
# Vertical lines for means
vlines!(ax, [analytical_mean], color = :blue, linestyle = :dash, alpha = 0.8)
vlines!(ax, [mcmc_mean], color = :red, linestyle = :dot, alpha = 0.8)
# Add text annotations with statistics
text!(ax, 0.35, maximum(analytical_density) * 0.95,
text = L"\text{Analytical: Mean} = %$(round(analytical_mean, digits=3))",
fontsize = 10, color = :blue)
text!(ax, 0.35, maximum(analytical_density) * 0.85,
text = L"\text{MCMC: Mean} = %$(round(mcmc_mean, digits=3))",
fontsize = 10, color = :red)
# Calculate P(theta < 0.25) - tail probability
tail = 0.25
vlines!(ax, [tail], color = :gray, linestyle = :solid, alpha = 0.6, label = L"$\theta = %$(tail)$")
analytical_tail_prob = cdf(data.analytical_posterior, tail)
analytical_tail_prob = @sprintf("%.2E", analytical_tail_prob)
mcmc_tail_prob = mean(data.mcmc_samples .< tail)
mcmc_tail_prob = @sprintf("%.2E", mcmc_tail_prob)
text!(ax, 0.05, maximum(analytical_density) * 0.55,
text = L"\text{Analytical}: $P(\theta < %$(tail))$ = %$(analytical_tail_prob)",
fontsize = 10, color = :blue)
text!(ax, 0.05, maximum(analytical_density) * 0.45,
text = L"\text{MCMC}: $P(\theta < %$(tail))$ = %$(mcmc_tail_prob)",
fontsize = 10, color = :red)
axislegend(ax, position = :rt)
end
fig_comparison
We can see that with a small number of samples, we do a reasonable job of approximating the mean, but the tails are poorly estimated. This is a general principle in Monte Carlo methods: estimating tail probabilities requires many more samples than estimating central tendencies like the mean or median. Sometimes, specialized techniques like importance sampling are needed to accurately capture tail behavior.
Sequential Bayesian updating
The following example demonstrates how posterior distributions evolve as we observe more coin flips:
function demonstrate_bayesian_updating()
# True parameter and prior
true_θ = 0.55
prior_α, prior_β = 3, 3
# Generate sequence of coin flips
Random.seed!(42)
n_total = 12
flips = rand(n_total) .< true_θ # true = heads, false = tails
# Convert to H/T string for display
flip_chars = [f ? "H" : "T" for f in flips]
# Stages to show (number of flips observed)
stages = [1, 3, 5, 8, 12]
# Create figure with subplots
fig = Figure(size = (1000, 600))
θ_range = 0:0.01:1
for (i, stage) in enumerate(stages)
# Calculate posterior parameters
y_obs = sum(flips[1:stage]) # heads observed so far
n_obs = stage # total flips observed
post_α = prior_α + y_obs
post_β = prior_β + n_obs - y_obs
# Create subplot
row = div(i - 1, 3) + 1
col = ((i - 1) % 3) + 1
ax = Axis(fig[row, col],
xlabel = "θ (probability of heads)",
ylabel = "Density",
title = "$(join(flip_chars[1:stage])) ($(y_obs)/$(n_obs) heads)")
# Plot prior (only for first subplot)
if i == 1
prior_density = pdf.(Beta(prior_α, prior_β), θ_range)
lines!(ax, θ_range, prior_density,
color = :gray, linewidth = 2, linestyle = :dash, label = "Prior Beta(3,3)")
end
# Plot posterior
posterior = Beta(post_α, post_β)
posterior_density = pdf.(posterior, θ_range)
lines!(ax, θ_range, posterior_density,
color = :blue, linewidth = 3, label = "Posterior")
# Add vertical lines
vlines!(ax, [true_θ], color = :green, linewidth = 2, linestyle = :dot,
label = i == 1 ? "True θ = $(true_θ)" : "")
# MLE estimate
mle_θ = stage == 0 ? 0.5 : y_obs / n_obs
vlines!(ax, [mle_θ], color = :red, linewidth = 2, linestyle = :dashdot,
label = i == 1 ? "MLE" : "")
# Posterior mean
post_mean = post_α / (post_α + post_β)
vlines!(ax, [post_mean], color = :blue, linewidth = 2, alpha = 0.7,
label = i == 1 ? "Posterior mean" : "")
# Add text with posterior parameters
text!(ax, 0.05, maximum(posterior_density) * 0.9,
text = "Beta($(post_α), $(post_β))",
fontsize = 11, color = :blue)
# Add 95% credible interval
ci_lower = quantile(posterior, 0.025)
ci_upper = quantile(posterior, 0.975)
text!(ax, 0.05, maximum(posterior_density) * 0.75,
text = "95% CI: [$(round(ci_lower, digits=2)), $(round(ci_upper, digits=2))]",
fontsize = 10, color = :black)
# Set consistent x-axis limits
xlims!(ax, 0, 1)
end
# Add legend in the sixth panel (bottom right)
ax_legend = Axis(fig[2, 3],
xlabel = "θ (probability of heads)",
ylabel = "Density")
# Create dummy plots for legend
lines!(ax_legend, [0, 0], [0, 0], color = :gray, linewidth = 2, linestyle = :dash, label = "Prior Beta(3,3)")
lines!(ax_legend, [0, 0], [0, 0], color = :blue, linewidth = 3, label = "Posterior")
vlines!(ax_legend, [0], color = :green, linewidth = 2, linestyle = :dot, label = "True θ = $(true_θ)")
vlines!(ax_legend, [0], color = :red, linewidth = 2, linestyle = :dashdot, label = "MLE")
vlines!(ax_legend, [0], color = :blue, linewidth = 2, alpha = 0.7, label = "Posterior mean")
axislegend(ax_legend, position = :cc, framevisible = true)
hidespines!(ax_legend)
hidedecorations!(ax_legend)
# Add overall title
Label(fig[0, :], "Bayesian Updating: How Posterior Evolves with New Evidence",
fontsize = 16, font = "TeX Gyre Heros Bold")
return fig
end
fig_updating = demonstrate_bayesian_updating()
fig_updating