Answer: The standard deviation is the square root of the variance — it measures how spread out numbers are. Compute it by finding the mean, squaring each deviation from the mean, averaging those squares (use $N$ for a full population or $n-1$ for a sample), then taking the square root.
Explanation
- Use the population formulas when you have data for the entire population.
- Use the sample formulas (divide by $n-1$) when your data is a sample and you want an unbiased estimator of the population variance.
Formulas
Population variance and standard deviation:
$$\sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i-\mu)^2$$
$$\sigma = \sqrt{\sigma^2}$$
Sample variance and standard deviation:
$$s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2$$
$$s = \sqrt{s^2}$$
Steps (practical)
- Compute the mean: $ \mu = \frac{1}{N}\sum x_i$ (or $\bar{x}$ for a sample).
- For each data point compute the deviation: $x_i – \mu$.
- Square each deviation: $(x_i-\mu)^2$.
- Sum the squared deviations: $\sum (x_i-\mu)^2$.
- Divide by $N$ (population) or $n-1$ (sample) to get variance.
- Take the square root to get standard deviation.
Worked example
Data: 2, 4, 4, 4, 5 (here treat as population)
- Mean: $\mu = \frac{2+4+4+4+5}{5} = 3.8$
- Squared deviations: $(2-3.8)^2=3.24$, $(4-3.8)^2=0.04$ (three times), $(5-3.8)^2=1.44$
- Sum: $3.24+0.04+0.04+0.04+1.44=4.8$
- Population variance: $\sigma^2 = \frac{4.8}{5}=0.96$
- Population standard deviation: $\sigma=\sqrt{0.96}\approx0.98$
If treated as a sample:
- Sample variance: $s^2=\frac{4.8}{5-1}=1.20$
- Sample standard deviation: $s=\sqrt{1.20}\approx1.10$
Quick notes
- Units: SD has same units as the original data.
- Use $n-1$ (Bessel’s correction) for samples to correct bias.
- Outliers strongly affect SD because of squaring deviations.
Calculator / software
- Excel: STDEV.P(range) for population, STDEV.S(range) for sample.
- Python (numpy):
np.std(data, ddof=0)for population,np.std(data, ddof=1)for sample.
Example Python:
import numpy as np
data = np.array([2,4,4,4,5])
pop_sd = np.std(data, ddof=0) # population
sample_sd = np.std(data, ddof=1) # sample
If you want, I can show this with your specific dataset or demonstrate step-by-step on a new example.