Avoiding Hypothesis Test pitfalls; using the right test and non-normal distributed data

How to correctly conduct t-tests

6 min readAug 14, 2022

Why this article

Plenty of tutorials exist on the basics of hypothesis testing, so we will not cover the basics here. I have noticed however that this understanding often does not translate to accurate tests being conducted, and for that reason, wanted to provide this quick guide to help analysts facing hypothesis tests at work.

Refresher

In essence, a statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. It is the mathematical backbone of statements like “these two groups are different”, or “this group behaves as we expected”, etc. We will focus only on the mean.

Z-test or T-test

We can run two types of tests, the z-test or the t-test. The set of rules that govern when to use which follows:

If you have a large sample with normally distributed data, use the Z-test
If you have a small sample and the population is normal
— Use the z-test if you know the population variance.
— Use the t-test if you don’t know the population variance

In reality, it is often the case that we do not know the population variance, as in order to calculate this, we need the population mean, which is exaclty what we are trying to infer. For this reason, it is encouraged to always use the t-test, unless someone with a PhD in statistics tells you not to.

If you have a small sample and you don’t know if the population is normal (either you don’t know the distribution, or you know that it’s not normal), neither the z nor t is appropriate. A nonparametric test will be needed, discussed at the end of this article.

Use the right hypothesis test

1-sample test:

This test is used to compare the sample mean against a hypothesized value. It is often used in quality assurance exercises, as certain metrics must remain within a given value as determined by some regulation.

2- (independent) sample test:

This test is used to find whether the population means of two independent groups are equal or not. It is also known as A/B testing, where two groups are compared on a metric to find if one acts differently than the other. If you are familiar with econometrics, you can think of this as a cross-sectional analysis of two groups.

The 2-sample test takes your sample data from two groups and boils it down to the t score value. In layman's terms, we average the mean of each group and compare these means against each other.
The 2-sample test requires independent groups for each sample, meaning members of each group cannot be part of the other group.
The 2-sample test requires variance across the two groups to be the same, although if not true, this can be adjusted for by using the right form of this test. In order to test whether variances are equal, or better said, not statistically different, we can use Levene’s test.

1-paired sample test (dependent samples):

Mathematically speaking this is a 1-sample test, however, it carries a completely different use. Confusion arises because the 1-paired sample test also compares the difference between two groups, just like the 2-sample test, however, unlike the 2-sample test, the 1-paired test does not have independent groups, as we are comparing individuals before treatment to their outcomes after treatment.

The 1-paired sample test then calculates per subject the difference in a variable of interest. It is over this difference that we have just calculated that a 1-sample test is run, where the null hypothesis is that there is no difference, or difference equals zero.

Another confusion point arises because analysts try to compare the variance of the two groups, however, the variance used in this test is the variance of the difference.

If the data you have is panel data, then you may be using the wrong statistical framework altogether. In this case, I would recommend you use difference-in-differences econometric approaches.

https://datatab.net/tutorial/one-sample-t-test

Make sure the distribution of data is normal

So far the above is only valid if the data at hand is normally distributed. There are a couple of statistical tests that can help you identify if your data is distributed normally, however, with the amount of computing power available to us today, I often find it easier to simply plot the data in a histogram or distribution plot.

Book: Intermediate Statistics for dummies

If your data is not normally distributed, you cannot perform the tests described above without adjusting for this. We will provide some potential actions.

Option 1️⃣: Doing nothing
This option is only available if you have large sample sizes, and your data is not drastically skewed. Both of these characteristics are not clearly defined, and for that reason, it is advised to start with this process but elaborate on your work using other methods.

The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution.

Technically speaking the average of the population in a t-test and the average of the sampling distribution of the mean (X hat vs X double hat) should be close to each other according to the CLT. For this reason, some statisticians believe that at large sample sizes and no extreme outliers, the t-statistic is robust to non-normal data.

https://eje.bioscientifica.com/view/journals/eje/182/2/EJE-19-0922.xml#:~:text=The%20t%2Dtest%20is%20not,distributions%20of%20the%20outcome%20variable.

However, the definition of “large samples” and “extreme outliers” is not defined, leaving no clear interpretation. As a consequence, running a t-test on non-normal data can yield inappropriate results.

Option 2️⃣: Applying a non-parametric test
It is advised that when dealing with both nonnormal data and small sample sizes, analysts should apply non-parametric tests. However, non-parametric tests are also an option if you have large samples but nonnormal data.

Each of the statistical tests described earlier count with an equivalent non-parametric test. For example, to compare two distributions that are not normal, one uses the Wilcoxon Rank Sum test, also known as the Mann-Whitney test. The Wilcoxon Rank Sum test checks whether two populations have the same distribution (meaning whether the two histograms look the same) versus one of the populations shifting to the right or left.

However, nonparametric tests also count with their own set of requirements that have to be met before these methods are used. Below the requirements of the Wilcoxon Rank Sum test;

For more information about non-parametric tests, please watch the following youtube video:

https://www.youtube.com/watch?v=IcLSKko2tsg&ab_channel=zedstatistics

Option 3️⃣: Applying data transformations

Log transformation of the data:
Log transformations can be appropriate to make a non-normal distribution into a normally distributed one.

If often has the added value that this transformation may reduce the variance of the variable, however, you always have to check that log transformation indeed shaped the data as you desired, as this is not guaranteed. In the world of econometrics, log transformations have additional benefits (coefficients model-specific elasticity effects, and log transformations tend to reduce heteroskedasticity).

However, log transformations are just one of the most common transformations and are not guaranteed to work. For additional options, please reference this article, which briefly discuss the Box-Cox and Johnson Transformation.

Summary

Conducting hypothesis tests requires a deeper understanding than t-tests and p-values. It is important to understand what methodology is appropriate for your specific business scenario, and what are the requirements for your methodology to be valid.