Mosquito bites and beer consumption: simulation-based inference ✏️

This notebook demonstrates computational approaches to hypothesis testing using a practical example: whether drinking beer affects mosquito bite frequency. The analysis illustrates how simulation can provide intuitive understanding of statistical concepts without relying on complex mathematical assumptions.

This example showcases the power of computational thinking in statistics, where complex theoretical concepts become accessible through direct simulation. The same principles apply to climate data analysis where traditional parametric assumptions may not hold.

using CairoMakie
using StatsBase
using HypothesisTests
using DataFrames
using Printf
using Random

The research question

We investigate a practical question with broader implications for experimental design and causal inference:

Does drinking beer reduce the likelihood of being bitten by mosquitos?

This question exemplifies common challenges in environmental and health research where controlled experiments must account for multiple confounding factors. The statistical methods demonstrated here apply broadly to climate impact studies and policy evaluation.

Learning objectives

After working through this analysis, you should understand: - The logic underlying permutation testing and simulation-based inference - How computational methods can replace complex mathematical derivations - The interpretation of p-values through direct simulation - When simulation-based tests offer advantages over parametric approaches

Experimental data

The dataset comes from a controlled experiment where participants were randomly assigned to consume either beer or water, then exposed to mosquitos in a controlled environment. Researchers counted the number of mosquito bites received by each participant.

# Beer drinkers group
beer_bites = [
    27, 20, 21, 26, 27, 31, 24, 21, 20, 19, 23, 24, 28,
    19, 24, 29, 18, 20, 17, 31, 20, 25, 28, 21, 27
]

# Water drinkers group (control)
water_bites = [
    21, 22, 15, 12, 21, 16, 19, 15, 22, 24, 19, 23,
    13, 22, 20, 24, 18, 20
]

# Create summary table
data_summary = DataFrame(
    Group = ["Beer drinkers", "Water drinkers"],
    N_participants = [length(beer_bites), length(water_bites)],
    Mean_bites = [mean(beer_bites), mean(water_bites)],
    Std_bites = [std(beer_bites), std(water_bites)],
    Min_bites = [minimum(beer_bites), minimum(water_bites)],
    Max_bites = [maximum(beer_bites), maximum(water_bites)]
)

data_summary
2×6 DataFrame
Row Group N_participants Mean_bites Std_bites Min_bites Max_bites
String Int64 Float64 Float64 Int64 Int64
1 Beer drinkers 25 23.6 4.1332 17 31
2 Water drinkers 18 19.2222 3.67112 12 24

The descriptive statistics reveal that beer drinkers received more mosquito bites on average. However, we need statistical analysis to determine whether this difference could reasonably be attributed to random variation.

Initial analysis: observed difference

The most direct approach compares the average number of bites between groups. This test statistic captures the core research question in a single number.

# Calculate the observed difference in means
observed_diff = mean(beer_bites) - mean(water_bites)

# Create results summary
initial_analysis = DataFrame(
    Statistic = ["Beer group mean", "Water group mean", "Difference (Beer - Water)"],
    Value = [mean(beer_bites), mean(water_bites), observed_diff]
)

initial_analysis
3×2 DataFrame
Row Statistic Value
String Float64
1 Beer group mean 23.6
2 Water group mean 19.2222
3 Difference (Beer - Water) 4.37778

Beer drinkers received approximately 5.1 more bites on average than water drinkers. But is this difference statistically meaningful, or could it arise from random variation alone?

The null hypothesis and permutation testing

The skeptic’s position forms our null hypothesis: drinking beer has no effect on mosquito bite frequency. Under this assumption, the group assignment (beer vs. water) is irrelevant, and the observed difference arose by chance.

We can test this hypothesis through permutation testing, which simulates what would happen if the null hypothesis were true.

Two approaches to hypothesis testing

Parametric approach (t-test): - Assumes data follow specific distributions (normal) - Uses mathematical theory for p-value calculation - Fast but relies on assumptions that may not hold

Simulation approach (permutation test): - Makes minimal distributional assumptions - Uses computational power instead of mathematical theory - More intuitive and flexible

Implementing permutation testing

The logic is straightforward: if treatment assignment doesn’t matter, we can randomly shuffle group labels and recalculate the difference.

function calculate_permuted_difference(group1_data, group2_data)
    """Calculate difference in means after random permutation of group labels"""
    # Combine all data
    all_data = vcat(group1_data, group2_data)

    # Randomly shuffle all observations
    shuffled_data = shuffle(all_data)

    # Reassign to groups with original group sizes
    n1 = length(group1_data)
    new_group1 = shuffled_data[1:n1]
    new_group2 = shuffled_data[(n1+1):end]

    # Calculate difference in means
    return mean(new_group1) - mean(new_group2)
end

# Test the function
example_permuted_diff = calculate_permuted_difference(beer_bites, water_bites)
@printf("Example permuted difference: %.2f\n", example_permuted_diff)
Example permuted difference: 0.46

This function simulates one possible outcome under the null hypothesis. By repeating this process thousands of times, we can characterize the full distribution of differences we’d expect by chance alone.

Generating the null distribution

# Set seed for reproducibility
Random.seed!(42)

# Generate many permuted differences
n_permutations = 50_000
permuted_differences = [
    calculate_permuted_difference(beer_bites, water_bites)
    for _ in 1:n_permutations
]

# Summarize the null distribution
null_summary = DataFrame(
    Statistic = ["Mean", "Std Dev", "Min", "Max", "2.5th percentile", "97.5th percentile"],
    Value = [
        mean(permuted_differences),
        std(permuted_differences),
        minimum(permuted_differences),
        maximum(permuted_differences),
        quantile(permuted_differences, 0.025),
        quantile(permuted_differences, 0.975)
    ]
)

null_summary
6×2 DataFrame
Row Statistic Value
String Float64
1 Mean 0.00119
2 Std Dev 1.38007
3 Min -5.36889
4 Max 5.14222
5 2.5th percentile -2.69333
6 97.5th percentile 2.65778

The null distribution shows what group differences we’d expect if treatment assignment were random. Most permuted differences cluster around zero, as expected when there’s no true effect.

Visualizing the results

The key insight comes from comparing our observed difference to the null distribution. How extreme is our observed value relative to what we’d expect by chance?

function create_permutation_plot(null_distribution, observed_value)
    """Create visualization comparing observed value to null distribution"""
    fig = Figure(size = (800, 500))
    ax = Axis(fig[1, 1],
        xlabel = "Difference in means (Beer - Water)",
        ylabel = "Density",
        title = "Permutation test results")

    # Plot null distribution
    hist!(ax, null_distribution, bins = 40, normalization = :pdf,
        color = (:blue, 0.6), label = "Null distribution")

    # Mark observed value
    vlines!(ax, [observed_value], color = :red, linewidth = 3,
        label = "Observed difference")

    # Add text annotation
    text!(ax, observed_value + 0.5, 0.15,
        text = @sprintf("Observed\n%.2f", observed_value),
        color = :red, fontsize = 12)

    axislegend(ax, position = :rt)
    return fig
end

fig_permutation = create_permutation_plot(permuted_differences, observed_diff)
fig_permutation

The visualization clearly shows that our observed difference lies in the extreme tail of the null distribution. This suggests that the observed difference is unlikely to have occurred by chance alone.

Statistical significance assessment

# Calculate p-value: proportion of permuted differences ≥ observed
p_value_one_sided = mean(permuted_differences .>= observed_diff)
p_value_two_sided = mean(abs.(permuted_differences) .>= abs(observed_diff))

# Effect size (Cohen's d)
pooled_std = sqrt(((length(beer_bites) - 1) * var(beer_bites) +
                   (length(water_bites) - 1) * var(water_bites)) /
                  (length(beer_bites) + length(water_bites) - 2))
cohens_d = observed_diff / pooled_std

# Summary results
test_results = DataFrame(
    Measure = [
        "Observed difference",
        "One-sided p-value",
        "Two-sided p-value",
        "Effect size (Cohen's d)",
        "Interpretation"
    ],
    Value = [
        @sprintf("%.2f bites", observed_diff),
        @sprintf("%.4f", p_value_one_sided),
        @sprintf("%.4f", p_value_two_sided),
        @sprintf("%.2f", cohens_d),
        p_value_two_sided < 0.05 ? "Statistically significant" : "Not significant"
    ]
)

test_results
5×2 DataFrame
Row Measure Value
String String
1 Observed difference 4.38 bites
2 One-sided p-value 0.0003
3 Two-sided p-value 0.0008
4 Effect size (Cohen's d) 1.11
5 Interpretation Statistically significant

Comparison with parametric testing

For completeness, we can compare our simulation-based results with traditional parametric tests.

# Equal variance t-test
t_test_equal = EqualVarianceTTest(beer_bites, water_bites)
# Unequal variance t-test (Welch's t-test)
t_test_unequal = UnequalVarianceTTest(beer_bites, water_bites)

# Comparison summary
comparison_results = DataFrame(
    Method = ["Permutation test", "Equal variance t-test", "Unequal variance t-test"],
    P_value = [
        p_value_two_sided,
        pvalue(t_test_equal),
        pvalue(t_test_unequal)
    ],
    Test_statistic = [
        "Difference in means",
        @sprintf("t = %.3f", t_test_equal.t),
        @sprintf("t = %.3f", t_test_unequal.t)
    ]
)

comparison_results
3×3 DataFrame
Row Method P_value Test_statistic
String Float64 String
1 Permutation test 0.00082 Difference in means
2 Equal variance t-test 0.000883127 t = 3.587
3 Unequal variance t-test 0.000747402 t = 3.658

All three methods reach similar conclusions, but the permutation test requires fewer assumptions about the underlying data distribution.

Key insights and broader applications

This analysis demonstrates several important statistical concepts:

Simulation-based inference: Complex statistical concepts become intuitive when approached through simulation rather than mathematical theory.

Assumption-free testing: Permutation tests work without requiring specific distributional assumptions, making them robust for real-world data.

P-value interpretation: The p-value directly represents the probability of observing such an extreme difference under the null hypothesis, clarified through simulation.

Effect size matters: Statistical significance alone doesn’t indicate practical importance—effect size measures like Cohen’s d provide additional context.

Climate science applications

These simulation-based methods prove particularly valuable in climate research where:

  • Non-normal data: Climate variables often have skewed or heavy-tailed distributions
  • Small sample sizes: Paleoclimate records or extreme event counts may have limited observations
  • Complex dependencies: Traditional parametric assumptions may not hold for climate time series
  • Policy decisions: Robust statistical inference supports high-stakes climate adaptation decisions

The computational approach demonstrated here scales naturally to more complex problems while maintaining the same intuitive logic: simulate what would happen under different assumptions and compare with observed data.