DATA - stays on your device
AI Assistant

P-Value Calculator

Compute statistical significance and visualize rejection regions on the distribution

The p-value is the probability of obtaining a test statistic as extreme as the one observed — or more extreme — assuming the null hypothesis is true. A small p-value means the data are unlikely under H₀, giving you reason to reject it.

The most common threshold is α = 0.05: if p < 0.05, the result is called statistically significant at the 95% confidence level. Stricter fields use α = 0.01 or even 0.001. The p-value does not tell you the probability that H₀ is true — it only measures how surprising the data are under that assumption.

This calculator supports z-tests, t-tests, chi-square tests, and F-tests. Enter your test statistic and degrees of freedom (if needed) and the AI will compute the p-value, shade the rejection region on the distribution curve, and explain whether to reject H₀.

Graph

FAQ

What is a p-value?
The p-value is the probability of observing a test statistic as extreme as the one calculated from your sample, assuming the null hypothesis (H₀) is true. A very small p-value means the observed result would be very unlikely if H₀ were true — which is evidence against H₀. It is not the probability that H₀ is true or false.
What does p < 0.05 mean?
When p < 0.05, there is less than a 5% probability that the observed result occurred by random chance alone under H₀. By convention this is called statistically significant. However, significance does not imply practical importance — a large sample can make a tiny effect statistically significant. Always report effect size alongside the p-value.
What is the difference between a one-tailed and two-tailed test?
A two-tailed test checks for a difference in either direction (H₁: μ ≠ μ₀) and splits α across both tails. A one-tailed test checks for a specific direction (H₁: μ > μ₀ or μ < μ₀) and places all of α in one tail. Use one-tailed only when you had a directional hypothesis before collecting data; otherwise use two-tailed to avoid inflating power artificially.
What are common significance levels?
The most widely used level is α = 0.05 (5%). Stricter standards include α = 0.01 (1%) used in medical research, and α = 0.001 (0.1%) used in particle physics (the "five sigma" standard). The choice of α should be made before data collection based on the cost of Type I errors (false positives) in your field.
How do I interpret the rejection region?
The rejection region is the set of test statistic values that would lead you to reject H₀. It corresponds to the most extreme values under the null distribution — the shaded tails. If your observed test statistic falls inside the rejection region (equivalently, if p < α), you reject H₀. The AI plots this region on the distribution so you can see exactly where your statistic lands.
What is the difference between a p-value and a confidence interval?
A p-value gives a binary decision aid: reject or fail to reject H₀. A confidence interval (CI) gives a range of plausible values for the parameter, carrying more information. They are mathematically linked: a 95% CI for a parameter excludes the null value if and only if the two-tailed p-value < 0.05. Most statisticians recommend reporting both.