When to Use Mann-Whitney Instead of T-Test: A Practical Guide for Researchers

Home › Blog › When to Use Mann-Whitney Instead of T-Test: A Practical Guide for Researchers

You ran a Shapiro-Wilk test, your data failed normality (p = .003), and now your advisor is asking why you're still running a t-test. Or maybe your sample is 18 participants per group and you have no idea whether the Central Limit Theorem saves you. These are the moments when knowing when to use Mann-Whitney instead of t-test stops being academic and starts being urgent.

This guide walks through the decision rules, shows a worked example, and explains how to report results in APA 7 format.

The Short Answer: When to Use Mann-Whitney Instead of T-Test

Use the Mann-Whitney U test (also called the Wilcoxon rank-sum test) when you're comparing two independent groups and at least one of these is true:

Your dependent variable is ordinal (e.g., Likert ratings, pain scales)
Your data are continuous but not normally distributed, especially with small samples (n < 30 per group)
You have outliers that you cannot justify removing
Your data show strong skew or heavy tails

Use the independent samples t-test when:

Your dependent variable is continuous (interval or ratio)
Each group is approximately normally distributed (or n ≥ 30 per group)
Variances are roughly equal (or you use Welch's correction)

The Mann-Whitney test compares ranks rather than means, which is why it's robust to violations of normality. The trade-off: when assumptions for the t-test are met, the t-test has slightly more statistical power.

The Normality Question: Don't Just Run a Test

Many researchers default to Shapiro-Wilk to decide. That's not wrong, but it's incomplete. With n = 200, Shapiro-Wilk will flag trivial non-normality. With n = 15, it will miss real problems.

A better approach

Visualize first. Look at histograms and Q-Q plots for each group.
Check skewness and kurtosis. Values between -1 and +1 are generally fine.
Run Shapiro-Wilk as a supplement, not as the sole decision.
Consider sample size. With n ≥ 30 per group, the t-test is reasonably robust to mild non-normality thanks to the Central Limit Theorem.

In StatRyx, you can run all three checks — Shapiro-Wilk, Q-Q plot, and descriptives — in one click before deciding which test to use.

Mann-Whitney vs T-Test: Key Differences

Feature	Independent t-test	Mann-Whitney U
What it compares	Means	Distributions (ranks)
Data type	Continuous	Ordinal or continuous
Normality required	Yes (or n ≥ 30)	No
Sensitive to outliers	Yes	No
Reports	M, SD, t, df, p	Mdn, U, Z, p, r
Effect size	Cohen's d	r or rank-biserial correlation
Power when assumptions met	Higher	~95% of t-test

Worked Example: Comparing Anxiety Scores Across Two Therapy Conditions

Suppose you're comparing post-treatment anxiety scores between Group A (CBT, n = 22) and Group B (waitlist, n = 20). Anxiety is measured on a 0–40 scale.

Step 1: Check assumptions. Histograms show Group B is heavily right-skewed (skewness = 1.84). Shapiro-Wilk for Group B: W = 0.86, p = .008. The t-test is off the table.

Step 2: Run Mann-Whitney U. Results:

Group A: Mdn = 14.5
Group B: Mdn = 22.0
U = 118.5, Z = -2.47, p = .013
Effect size r = |Z| / √N = 2.47 / √42 = .38

Step 3: Interpret.

U = 118.5 is the test statistic based on rank sums. Its raw value isn't intuitive, but the associated Z and p tell the story.
p = .013 means the probability of seeing this rank difference under the null hypothesis is 1.3%. Significant at α = .05.
r = .38 is a medium effect size by Cohen's conventions (.10 small, .30 medium, .50 large).

Step 4: Report in APA 7 format.

A Mann-Whitney U test indicated that post-treatment anxiety scores were significantly lower for the CBT group (Mdn = 14.5) than the waitlist group (Mdn = 22.0), U = 118.5, z = -2.47, p = .013, r = .38.

Note: report medians, not means, for Mann-Whitney. Reporting means here would imply the test compares means, which it doesn't (technically it tests stochastic dominance — whether values from one group tend to exceed values from the other).

Common Mistakes to Avoid

Reporting means with Mann-Whitney

Reviewers will catch this. The test ranks values; report medians and IQRs.

Using Mann-Whitney to "rescue" a nonsignificant t-test

Switching tests after seeing results is p-hacking. Decide based on assumptions, not outcomes.

Ignoring effect size

A significant p-value with n = 500 might reflect a trivial difference.

Stop calculating this by hand — run it free in StatRyx → Try StatRyx