Or try with a sample dataset:

Mediation analysis tests whether the effect of an independent variable X on an outcome Y operates through an intermediate variable M (the mediator). Rather than simply asking "does X affect Y?", mediation asks "how does X affect Y — what is the mechanism?" The total effect of X on Y is decomposed into: the indirect effect (X → M → Y, the part that flows through the mediator) and the direct effect (X → Y after accounting for M). The indirect effect is quantified as the product of path a (X → M) and path b (M → Y controlling X), written as a×b. Complete mediation occurs when the direct effect (c') is zero after including the mediator — all of X's effect on Y goes through M. Partial mediation occurs when both the indirect and direct effects are significant.
The classic example from social psychology: socioeconomic status (X) predicts depression (Y). A mediation analysis reveals that SES affects depression largely through perceived stress (M) — lower SES → higher perceived stress → worse depression. The path coefficients show: a = −0.41 (lower SES → more stress), b = 0.52 (more stress → more depression), c' = −0.12 (small direct effect of SES on depression after controlling stress). The indirect effect a×b = −0.21 accounts for 64% of the total effect — stress mediates most of the SES-depression relationship.
Mediation analysis requires careful attention to causal inference. The path model implicitly assumes a causal ordering: X causes M, which causes Y, and there are no unmeasured confounders of the M → Y path. These are assumptions about the data-generating process that cannot be verified from the data alone — they must be justified by theory, temporal ordering, or experimental manipulation. Bootstrap resampling (rather than the older Sobel test) is the gold standard for testing the indirect effect because the sampling distribution of a×b is typically non-normal, and bootstrap confidence intervals correctly capture its asymmetry. The percentile bootstrap CI (5000+ resamples) is the recommended default; if the CI excludes zero, the indirect effect is statistically significant.
| Column | Description | Example |
|---|---|---|
X | Predictor / independent variable | stress_score, dose, SES |
M | Mediator variable | sleep_hours, cortisol, self_efficacy |
Y | Outcome / dependent variable | health_index, performance, depression |
covariate | Optional: control variables | age, sex, baseline |
Any column names work — describe them in your prompt. All variables should be numeric and continuous; for binary mediators or outcomes, mention that logistic regression should be used for the relevant path. With covariates, they are included as predictors in both the M and Y regressions.
| Output | What it means |
|---|---|
| Path a | X → M coefficient — how strongly X predicts the mediator |
| Path b | M → Y coefficient (controlling X) — how strongly mediator predicts outcome |
| Direct effect (c') | X → Y coefficient after including M — X's effect not explained by M |
| Total effect (c) | X → Y without M in the model — c = c' + a×b |
| Indirect effect (a×b) | Effect of X on Y transmitted through M — the mediated portion |
| 95% Bootstrap CI | Percentile CI for indirect effect — excludes zero = significant mediation |
| Proportion mediated | a×b / c — fraction of total effect explained by the mediator; only meaningful when c and a×b have the same sign |
| Path diagram | Visual summary of all paths with coefficients; arrows show hypothesized causal flow |
| Scenario | What to type |
|---|---|
| Basic mediation | X = stress, M = sleep, Y = health; bootstrap indirect effect 5000 reps; 95% CI; path diagram |
| With covariates | mediation with covariates age and sex; include in both M and Y regressions; report adjusted path coefficients |
| Multiple mediators | parallel mediation with 3 mediators (M1=sleep, M2=exercise, M3=diet); indirect effect for each; compare specific indirect effects |
| Serial mediation | serial mediation: X → M1 → M2 → Y; indirect effect through sequential chain; bootstrap CI |
| Proportion mediated | mediation analysis; report proportion of total effect mediated; test if direct effect becomes non-significant |
| Binary outcome | Y is binary (0/1); use logistic regression for Y path; odds-ratio scale indirect effect; bootstrap CI |
| Moderated mediation | test whether mediation effect varies by gender (moderator W); index of moderated mediation; conditional indirect effects |
| Sensitivity analysis | report E-value for indirect effect — minimum confounding strength needed to explain away the mediation |
Use the Multiple Regression calculator to estimate the individual regression paths (a, b, c, c') separately — mediation analysis is a structured combination of regression equations. Use the Partial Correlation Calculator to examine whether the X-Y correlation shrinks when M is controlled — a large drop in partial correlation is informal evidence of mediation (though not a substitute for bootstrapped indirect effects). Use the Linear Mixed Effects Model Calculator when data are clustered or longitudinal and mediation is tested in a multilevel context — standard mediation assumes independent observations. Use the Factor Analysis Calculator if the mediator is a latent construct measured by multiple items — latent variable mediation (SEM) handles measurement error in M that inflates direct effects and deflates indirect effects in manifest-variable mediation.
Why should I use bootstrap confidence intervals instead of the Sobel test? The Sobel test approximates the sampling distribution of a×b as normal, which is only valid in large samples with symmetric distributions of the paths a and b. In practice, the distribution of a×b is often right-skewed (especially with small samples or strong effects), and the Sobel test is too conservative — it underestimates significance. The percentile bootstrap makes no distributional assumptions: it directly resamples the data thousands of times, recomputes a×b each time, and uses the 2.5th and 97.5th percentiles of this empirical distribution as the CI. Simulations consistently show that bootstrap CIs have better coverage (closer to the nominal 95%) than the Sobel test, especially with n < 200. Use 5000+ bootstrap resamples for stable CI estimates.
What is the difference between full and partial mediation?Full (complete) mediation occurs when the direct effect c' is not significantly different from zero after including M in the model — all of X's effect on Y is explained by the mediator. Partial mediation occurs when both the indirect effect (a×b) and the direct effect (c') are significant — M explains some but not all of X's effect on Y. Note that Baron and Kenny's (1986) requirement of a significant total effect c as a prerequisite for mediation is outdated — the modern approach (Preacher & Hayes, 2008) tests the indirect effect directly via bootstrap, and mediation can exist even when c is non-significant (this can happen when indirect and direct effects have opposite signs and cancel each other out, called inconsistent or suppressor mediation).
How many observations do I need for reliable mediation analysis?
Simulation studies suggest n ≥ 200 for stable bootstrap estimates of the indirect effect with typical effect sizes (a ≈ 0.3–0.5, b ≈ 0.3–0.5). With n < 100, bootstrap CIs are wide and often include zero even when true mediation exists — insufficient power to detect the indirect effect. The power to detect mediation depends on the product a×b, not just the individual paths: two weak paths (a = 0.20, b = 0.20) require n ≈ 500+ to reliably detect the indirect effect of 0.04. Use a power calculator for mediation (e.g., the pwr2ppl R package or Monte Carlo power analysis) when planning sample sizes. When n < 50 and mediation is the primary hypothesis, collect more data rather than relying on the bootstrap.
Can I run mediation with a binary X or Y? Yes, with modifications. If X is binary (e.g., treatment vs control), the standard OLS approach works fine — X is just treated as a 0/1 dummy variable. If Y is binary, use logistic regression for the Y path (b and c') and interpret the indirect effect on the log-odds scale; bootstrap the indirect effect as usual. If M is binary, use logistic regression for the M path (a) and express the indirect effect as the product of the log-odds ratio from path a and the OLS coefficient from path b — this is an approximation, and the fully causal interpretation requires additional assumptions. For binary M or Y, report the indirect effect on the standardized scale (using probit regression) or use the product of coefficients method with heteroscedasticity-robust standard errors.