Mosquito bites and beer consumption: simulation-based inference ✏️
This notebook demonstrates computational approaches to hypothesis testing using a practical example: whether drinking beer affects mosquito bite frequency. The analysis illustrates how simulation can provide intuitive understanding of statistical concepts without relying on complex mathematical assumptions.
This example showcases the power of computational thinking in statistics, where complex theoretical concepts become accessible through direct simulation. The same principles apply to climate data analysis where traditional parametric assumptions may not hold.
The research question
We investigate a practical question with broader implications for experimental design and causal inference:
Does drinking beer reduce the likelihood of being bitten by mosquitos?
This question exemplifies common challenges in environmental and health research where controlled experiments must account for multiple confounding factors. The statistical methods demonstrated here apply broadly to climate impact studies and policy evaluation.
Learning objectives
After working through this analysis, you should understand: - The logic underlying permutation testing and simulation-based inference - How computational methods can replace complex mathematical derivations - The interpretation of p-values through direct simulation - When simulation-based tests offer advantages over parametric approaches
Experimental data
The dataset comes from a controlled experiment where participants were randomly assigned to consume either beer or water, then exposed to mosquitos in a controlled environment. Researchers counted the number of mosquito bites received by each participant.
# Beer drinkers group
beer_bites = [
27, 20, 21, 26, 27, 31, 24, 21, 20, 19, 23, 24, 28,
19, 24, 29, 18, 20, 17, 31, 20, 25, 28, 21, 27
]
# Water drinkers group (control)
water_bites = [
21, 22, 15, 12, 21, 16, 19, 15, 22, 24, 19, 23,
13, 22, 20, 24, 18, 20
]
# Create summary table
data_summary = DataFrame(
Group = ["Beer drinkers", "Water drinkers"],
N_participants = [length(beer_bites), length(water_bites)],
Mean_bites = [mean(beer_bites), mean(water_bites)],
Std_bites = [std(beer_bites), std(water_bites)],
Min_bites = [minimum(beer_bites), minimum(water_bites)],
Max_bites = [maximum(beer_bites), maximum(water_bites)]
)
data_summary
Row | Group | N_participants | Mean_bites | Std_bites | Min_bites | Max_bites |
---|---|---|---|---|---|---|
String | Int64 | Float64 | Float64 | Int64 | Int64 | |
1 | Beer drinkers | 25 | 23.6 | 4.1332 | 17 | 31 |
2 | Water drinkers | 18 | 19.2222 | 3.67112 | 12 | 24 |
The descriptive statistics reveal that beer drinkers received more mosquito bites on average. However, we need statistical analysis to determine whether this difference could reasonably be attributed to random variation.
Initial analysis: observed difference
The most direct approach compares the average number of bites between groups. This test statistic captures the core research question in a single number.
# Calculate the observed difference in means
observed_diff = mean(beer_bites) - mean(water_bites)
# Create results summary
initial_analysis = DataFrame(
Statistic = ["Beer group mean", "Water group mean", "Difference (Beer - Water)"],
Value = [mean(beer_bites), mean(water_bites), observed_diff]
)
initial_analysis
Row | Statistic | Value |
---|---|---|
String | Float64 | |
1 | Beer group mean | 23.6 |
2 | Water group mean | 19.2222 |
3 | Difference (Beer - Water) | 4.37778 |
Beer drinkers received approximately 5.1 more bites on average than water drinkers. But is this difference statistically meaningful, or could it arise from random variation alone?
The null hypothesis and permutation testing
The skeptic’s position forms our null hypothesis: drinking beer has no effect on mosquito bite frequency. Under this assumption, the group assignment (beer vs. water) is irrelevant, and the observed difference arose by chance.
We can test this hypothesis through permutation testing, which simulates what would happen if the null hypothesis were true.
Two approaches to hypothesis testing
Parametric approach (t-test): - Assumes data follow specific distributions (normal) - Uses mathematical theory for p-value calculation - Fast but relies on assumptions that may not hold
Simulation approach (permutation test): - Makes minimal distributional assumptions - Uses computational power instead of mathematical theory - More intuitive and flexible
Implementing permutation testing
The logic is straightforward: if treatment assignment doesn’t matter, we can randomly shuffle group labels and recalculate the difference.
function calculate_permuted_difference(group1_data, group2_data)
"""Calculate difference in means after random permutation of group labels"""
# Combine all data
all_data = vcat(group1_data, group2_data)
# Randomly shuffle all observations
shuffled_data = shuffle(all_data)
# Reassign to groups with original group sizes
n1 = length(group1_data)
new_group1 = shuffled_data[1:n1]
new_group2 = shuffled_data[(n1+1):end]
# Calculate difference in means
return mean(new_group1) - mean(new_group2)
end
# Test the function
example_permuted_diff = calculate_permuted_difference(beer_bites, water_bites)
@printf("Example permuted difference: %.2f\n", example_permuted_diff)
Example permuted difference: 0.46
This function simulates one possible outcome under the null hypothesis. By repeating this process thousands of times, we can characterize the full distribution of differences we’d expect by chance alone.
Generating the null distribution
# Set seed for reproducibility
Random.seed!(42)
# Generate many permuted differences
n_permutations = 50_000
permuted_differences = [
calculate_permuted_difference(beer_bites, water_bites)
for _ in 1:n_permutations
]
# Summarize the null distribution
null_summary = DataFrame(
Statistic = ["Mean", "Std Dev", "Min", "Max", "2.5th percentile", "97.5th percentile"],
Value = [
mean(permuted_differences),
std(permuted_differences),
minimum(permuted_differences),
maximum(permuted_differences),
quantile(permuted_differences, 0.025),
quantile(permuted_differences, 0.975)
]
)
null_summary
Row | Statistic | Value |
---|---|---|
String | Float64 | |
1 | Mean | 0.00119 |
2 | Std Dev | 1.38007 |
3 | Min | -5.36889 |
4 | Max | 5.14222 |
5 | 2.5th percentile | -2.69333 |
6 | 97.5th percentile | 2.65778 |
The null distribution shows what group differences we’d expect if treatment assignment were random. Most permuted differences cluster around zero, as expected when there’s no true effect.
Visualizing the results
The key insight comes from comparing our observed difference to the null distribution. How extreme is our observed value relative to what we’d expect by chance?
function create_permutation_plot(null_distribution, observed_value)
"""Create visualization comparing observed value to null distribution"""
fig = Figure(size = (800, 500))
ax = Axis(fig[1, 1],
xlabel = "Difference in means (Beer - Water)",
ylabel = "Density",
title = "Permutation test results")
# Plot null distribution
hist!(ax, null_distribution, bins = 40, normalization = :pdf,
color = (:blue, 0.6), label = "Null distribution")
# Mark observed value
vlines!(ax, [observed_value], color = :red, linewidth = 3,
label = "Observed difference")
# Add text annotation
text!(ax, observed_value + 0.5, 0.15,
text = @sprintf("Observed\n%.2f", observed_value),
color = :red, fontsize = 12)
axislegend(ax, position = :rt)
return fig
end
fig_permutation = create_permutation_plot(permuted_differences, observed_diff)
fig_permutation
The visualization clearly shows that our observed difference lies in the extreme tail of the null distribution. This suggests that the observed difference is unlikely to have occurred by chance alone.
Statistical significance assessment
# Calculate p-value: proportion of permuted differences ≥ observed
p_value_one_sided = mean(permuted_differences .>= observed_diff)
p_value_two_sided = mean(abs.(permuted_differences) .>= abs(observed_diff))
# Effect size (Cohen's d)
pooled_std = sqrt(((length(beer_bites) - 1) * var(beer_bites) +
(length(water_bites) - 1) * var(water_bites)) /
(length(beer_bites) + length(water_bites) - 2))
cohens_d = observed_diff / pooled_std
# Summary results
test_results = DataFrame(
Measure = [
"Observed difference",
"One-sided p-value",
"Two-sided p-value",
"Effect size (Cohen's d)",
"Interpretation"
],
Value = [
@sprintf("%.2f bites", observed_diff),
@sprintf("%.4f", p_value_one_sided),
@sprintf("%.4f", p_value_two_sided),
@sprintf("%.2f", cohens_d),
p_value_two_sided < 0.05 ? "Statistically significant" : "Not significant"
]
)
test_results
Row | Measure | Value |
---|---|---|
String | String | |
1 | Observed difference | 4.38 bites |
2 | One-sided p-value | 0.0003 |
3 | Two-sided p-value | 0.0008 |
4 | Effect size (Cohen's d) | 1.11 |
5 | Interpretation | Statistically significant |
Comparison with parametric testing
For completeness, we can compare our simulation-based results with traditional parametric tests.
# Equal variance t-test
t_test_equal = EqualVarianceTTest(beer_bites, water_bites)
# Unequal variance t-test (Welch's t-test)
t_test_unequal = UnequalVarianceTTest(beer_bites, water_bites)
# Comparison summary
comparison_results = DataFrame(
Method = ["Permutation test", "Equal variance t-test", "Unequal variance t-test"],
P_value = [
p_value_two_sided,
pvalue(t_test_equal),
pvalue(t_test_unequal)
],
Test_statistic = [
"Difference in means",
@sprintf("t = %.3f", t_test_equal.t),
@sprintf("t = %.3f", t_test_unequal.t)
]
)
comparison_results
Row | Method | P_value | Test_statistic |
---|---|---|---|
String | Float64 | String | |
1 | Permutation test | 0.00082 | Difference in means |
2 | Equal variance t-test | 0.000883127 | t = 3.587 |
3 | Unequal variance t-test | 0.000747402 | t = 3.658 |
All three methods reach similar conclusions, but the permutation test requires fewer assumptions about the underlying data distribution.
Key insights and broader applications
This analysis demonstrates several important statistical concepts:
Simulation-based inference: Complex statistical concepts become intuitive when approached through simulation rather than mathematical theory.
Assumption-free testing: Permutation tests work without requiring specific distributional assumptions, making them robust for real-world data.
P-value interpretation: The p-value directly represents the probability of observing such an extreme difference under the null hypothesis, clarified through simulation.
Effect size matters: Statistical significance alone doesn’t indicate practical importance—effect size measures like Cohen’s d provide additional context.
Climate science applications
These simulation-based methods prove particularly valuable in climate research where:
- Non-normal data: Climate variables often have skewed or heavy-tailed distributions
- Small sample sizes: Paleoclimate records or extreme event counts may have limited observations
- Complex dependencies: Traditional parametric assumptions may not hold for climate time series
- Policy decisions: Robust statistical inference supports high-stakes climate adaptation decisions
The computational approach demonstrated here scales naturally to more complex problems while maintaining the same intuitive logic: simulate what would happen under different assumptions and compare with observed data.