DATA - stays on your device
AI Assistant

Statistics Calculator

Compute mean, median, standard deviation, and quartiles — clean your data first

Descriptive statistics summarize a dataset with a handful of numbers: the center (mean, median), the spread (standard deviation, IQR), and the extremes (min, max). But real-world data is rarely clean — it often contains missing values encoded as -999, N/A, or other sentinel codes that would distort every calculation.

This tool loads a sample dataset of 30 employees with Name, Age, Salary, and Department — including a few -999 sentinel values in the Salary column. Click Link Data to see how those bad values affect the mean, then ask the AI to clean them and recompute. The difference will be dramatic.

Paste your own CSV to compute statistics on your data.

Graph

FAQ

What is a sentinel value and why is it a problem?
A sentinel value is a number used to indicate missing data — commonly -999, -1, or 9999. If you compute the mean of a salary column that includes -999, the mean will be wildly wrong. Always clean sentinel values before running statistics.
When should I use median instead of mean?
Use the median when your data has outliers or is skewed. The median is the middle value and is unaffected by extreme values. For salaries, house prices, or income data, the median almost always gives a better picture of the "typical" value than the mean.
What does standard deviation tell me?
Standard deviation measures how spread out the data is around the mean. In a normal distribution, about 68% of values fall within ±1 standard deviation of the mean, and 95% within ±2. A small std dev means the data clusters tightly; a large one means high variability.
What is the five-number summary?
The five-number summary is: minimum, Q1 (25th percentile), median (50th), Q3 (75th), and maximum. These five values fully describe the spread and center of a dataset. They are also the building blocks of a box plot.