A t-test, also known as Student’s T-test, is a statistical test used to determine if there are any differences between two sample means and conclude if the differences are statistically significant (and can therefore be repeated to a whole population) or only happen due to random chance.
In other words, a T-test helps to test a hypothesis.
There are different types of T-tests, however, we shall focus on conducting T-tests on independent samples.
For independent samples, there are certain assumptions that are made:
Note: It’s important to conduct tests on the 2 samples to test for normality and equal variance to ensure the above assumptions are met.
Note: T-tests are used when comparing two population means only. If the comparison between more than two population means is needed, then other tests such as the ANOVA can be applied.
Suppose you are carrying out a clinical test to find out if taking supplements leads to a longer life expectancy. In this case, you will have two groups, group A which takes these supplements, and group B which takes a ‘sugar pill’ as a placebo.
In such an experiment, carrying out a t-test will help you know if the results are statistically significant or only happen due to random chance.
The formula to obtain a t-value manually is:
where:
The pooled standard error is calculated as:
After obtaining the t
value, use a Student’s t table and compare it with an expected value also known as the critical t-value () where = .You then reject the null hypothesis if
Calculating a t-value is not always possible manually, especially if the number of observations is large. In this case, we use statistical software such as Python
In Python, a t value is used to calculate an equivalent p_value and then we use the p_value to conclude on a hypothesis.
import pandas as pdfrom scipy.stats import ttest_inddata = pd.DataFrame({'supplement':['A','B','A','B','A','B','A','B','A','B'],'age':[70,60,72,62,76,56,90,70,87,50]})print(data.head(2))#define groupsgroup1 = data[data['supplement']=='A']group2 = data[data['supplement']=='B']#testprint(ttest_ind(group1['age'], group2['age'], equal_var=True))
DataFrame
with A as the supplement while B is the placebo.group1
and group1
where group1
is the group that took the supplement while group
has people that took the placebo.The p-value is and is less than . These results indicate that the difference in means between the two groups is statistically significant and that the supplement is effective in increasing life expectancy.
This means that this supplement can now be used in a general population to produce similar results.
Free Resources