We all misuse statistics every day. It’s one of the most useful and misunderstood branch of mathematics.
Because data is omnipresent in our work, it’s getting more and more dangerous to ignore basic statistical principles.
Last week we discussed tips for great experimentation, today I want to cover three common statistics mistakes we all make – and don’t worry, no math required.
Statistically Significant
Someone states a data point, you purse your lips – stroke you chin and say skeptically “…yes yes, but is it – statistically significant?”.
This nay-sayer’s go-to quip has always been a pet-peeve of mine. Why? Because “statistical significance” is arbitrary!
What we usually mean is – “what was the margin of error?” or “how confident are we that these results are not random?”. Both of which can be answered definitively (+/- X or with a 95% confidence interval).
Statistical significant, is a nuance of both of these. An experiment outcome could be both significant at the 90% confidence interval while also being insignificant at the 95% confidence interval.
So, next time someone invokes statistical significance to discredit or substantiate a claim, you can wisely quip – “Ah yes, and what confidence interval did you adhere to?”
Central Tendency: Mode, Mean, Median or Range?
Honestly, I still mix these guys up sometimes. Especially since “mean” often goes by it’s alias “average”.
“Hey, diddle diddle, the median’s the middle, You add then divide for the mean. The mode is the one that you see the most, and the range is the difference between.”
-unknown math poet
The biggest failure I see when it comes to central tendency is using the mean (average) incorrectly. Calculating an average is a great way to get a quick feel for a data set. e.g. Average age of people in a room, or average income of people at your dinner table.
But, if your data points don’t follow a normal distribution (think of a bell-curve), then the average could give you very misleading results. Say the average age of people in that room is 4 – but that won’t tell you much if the room is a maternity ward filled with adult mothers and their infants.
So, when using a measure of central tendency, take a second to consider if the data set makes sense to be averaged.
Sample Size
This is in the same vein as the first example – there is no “right” sample size. People will often throw around a mandatory number of samples in order to be statistically significant – but that’s just nonsense. Rather the sample size you choose should reflect two things.
- The margin of error you’re willing to accept (sample size isn’t the only factor but it’s important), as well as
- The “cost” of the sample size in time and money
You get diminishing returns in accuracy for the size of sample you use. Often a quick sample of 100 can tell you most of what you were going to find out with a larger sample of 1000, at tenth the cost.
Also – MAKE SURE YOUR SAMPLE IS RANDOM! I can’t tell you how often folks will grab the top 100 clients or the most recent 1000 widgets to use as their sample and then make sweeping assumptions about ALL their clients or widgets. Remember, inferences can only be made for a population if the sample is a random subset of that population.
And there you have it – keep these three statistical faux pas in mind and you’ll avoid some of the most common and painful analysis mistakes.
Happy sampling!