Your t-test came back significant (p = .003), and now a reviewer is asking for Cohen's d. You calculate it, get 0.42, and stare at the number wondering what to actually say about it in your discussion section. This Cohen's d interpretation guide will walk you through exactly what those numbers mean, when Cohen's classic benchmarks fail you, and how to report effect sizes in APA format without sounding like you're guessing.
Effect sizes matter because p-values only tell you whether an effect exists — not whether it matters. A massive sample can make a trivial difference statistically significant, and a small sample can miss a meaningful one. Cohen's d fixes that by quantifying the magnitude of a difference in standardized units.
What Cohen's d Actually Measures
Cohen's d expresses the difference between two means in units of standard deviation. The formula is straightforward:
d = (M₁ − M₂) / SD_pooled
If d = 0.5, the two group means differ by half a standard deviation. That's the entire concept. The standardization is what makes it comparable across studies, scales, and instruments — a key reason meta-analyses rely on it.
The Classic Cohen's d Benchmarks
Jacob Cohen (1988) proposed these thresholds as rough conventions, not laws:
| Cohen's d | Interpretation | % of non-overlap |
|---|---|---|
| 0.20 | Small effect | ~15% |
| 0.50 | Medium effect | ~33% |
| 0.80 | Large effect | ~47% |
| 1.20 | Very large | ~62% |
| 2.00 | Huge | ~81% |
The "% non-overlap" column is useful: a d of 0.80 means roughly 47% of one group's scores don't overlap with the other group's scores. That's a tangible way to communicate effect size to non-statisticians.
Why Cohen's Benchmarks Can Mislead You
Here's what most guides won't tell you: Cohen himself warned that these cutoffs should only be used when no better field-specific benchmarks exist. In some areas, a d of 0.20 is huge. In others, 0.80 is unimpressive.
Field-specific norms matter
- Education interventions: Hattie's meta-analyses suggest d = 0.40 is the "hinge point" for meaningful classroom effects
- Clinical psychology: Therapy outcome studies often report d = 0.50–0.80 as clinically meaningful
- Medical trials: A d of 0.20 for a life-saving drug can be enormously important
- Personality psychology: Sex differences with d = 0.20 are often framed as "small but reliable"
Before labeling your effect, check 2–3 recent meta-analyses in your specific subfield. Reviewers respect context-specific interpretation far more than blanket "medium effect" claims.
A Worked Example: Interpreting d Step by Step
Let's say you ran an independent-samples t-test comparing mindfulness training to a control condition on anxiety scores in 60 undergraduates (n = 30 per group):
- Mindfulness group: M = 18.4, SD = 5.2
- Control group: M = 22.1, SD = 5.6
- t(58) = 2.65, p = .010
- Cohen's d = 0.68
How do you write this up? Here's the APA-style sentence:
"Participants in the mindfulness condition reported significantly lower anxiety (M = 18.4, SD = 5.2) than controls (M = 22.1, SD = 5.6), t(58) = 2.65, p = .010, d = 0.68, 95% CI [0.16, 1.20]."
What d = 0.68 actually tells you:
- The groups differ by about two-thirds of a standard deviation
- Roughly 40% of the two groups' anxiety scores don't overlap
- By Cohen's conventions, this is between "medium" and "large"
- The confidence interval [0.16, 1.20] is wide — your effect could plausibly be small or very large. This is a red flag for replication.
That last point matters. A point estimate of d = 0.68 sounds impressive until you notice the CI crosses from "barely there" to "huge." Always report the confidence interval — StatRyx generates these automatically alongside your t-tests so you don't have to compute them manually.
Cohen's d vs. Hedges' g vs. Glass's Δ
These three are often confused. Use this quick reference:
| Effect size | When to use | Key feature |
|---|---|---|
| Cohen's d | Equal or similar group sizes, similar variances | Uses pooled SD; standard default |
| Hedges' g | Small samples (n < 20 per group) | Corrects Cohen's d for small-sample bias |
| Glass's Δ | Unequal variances; one group is a clear control | Uses only the control group's SD |
For most thesis-level work with n ≥ 20 per group, Cohen's d is fine. If your samples are smaller, report Hedges' g instead — it's nearly identical for large samples but more accurate when n is small. StatRyx will flag when Hedges' g is more appropriate and calculate both.
Paired vs. Independent d
A critical mistake: Cohen's d for paired samples is not the same as for independent samples. For repeated-measures designs, you should