How to implement the T-test for independent samples in Python

A t-test, also known as Student’s T-test, is a statistical test used to determine if there are any differences between two sample means and conclude if the differences are statistically significant (and can therefore be repeated to a whole population) or only happen due to random chance.

In other words, a T-test helps to test a hypothesis.

There are different types of T-tests, however, we shall focus on conducting T-tests on independent samples.

For independent samples, there are certain assumptions that are made:

Data follows a normal distribution.
The two samples are independent.
Population variance is unknown.
Sample variances for the variable under investigation are equal.

Note: It’s important to conduct tests on the 2 samples to test for normality and equal variance to ensure the above assumptions are met.

Note: T-tests are used when comparing two population means only. If the comparison between more than two population means is needed, then other tests such as the ANOVA can be applied.

Practical example of when to use a t-test

Suppose you are carrying out a clinical test to find out if taking supplements leads to a longer life expectancy. In this case, you will have two groups, group A which takes these supplements, and group B which takes a ‘sugar pill’ as a placebo.

In such an experiment, carrying out a t-test will help you know if the results are statistically significant or only happen due to random chance.

Calculating the T-test manually

The formula to obtain a t-value manually is:

t = \frac { \bar x_1- \bar x_2} {\sqrt{(s^2(\frac {1}{n_1}+ \frac{1}{n_2}))}}

where:

$\bar x_1$ : Sample mean for sample $1$ .
$\bar x_2$ : Sample mean for sample $2$ .
$s^2$ : Standard error for the two samples.
$n_1$ : The number of observations in sample $1$ .
$n_2$ : The number of observations in sample $2$ .

The pooled standard error $s^2$ is calculated as:

s^2 = {\sqrt{ \frac{ (n_1-1)s^2_1 + (n_2-1)s^2_2 }{n_1+n_2-2} } }

After obtaining the t value, use a Student’s t table and compare it with an expected value also known as the critical t-value ( $t_\alpha$ ) where $\alpha$ = $0.05$ .You then reject the null hypothesis $H_0$ if $t>t_\alpha$

Implementing the T-test for independent samples in Python

Calculating a t-value is not always possible manually, especially if the number of observations is large. In this case, we use statistical software such as Python

In Python, a t value is used to calculate an equivalent p_value and then we use the p_value to conclude on a hypothesis.

Explanation

Lines 1–2: Imports the needed libraries.
Line 4: Creates a DataFrame with A as the supplement while B is the placebo.
Lines 9–10: Defines group1 and group1 where group1 is the group that took the supplement while group has people that took the placebo.
Line 13: Calculates the T statistic and the p-value indicating that the supplement is actually effective and the differences are not due to random chance.

Conclusion

The p-value is $0.011$ and is less than $0.5$ . These results indicate that the difference in means between the two groups is statistically significant and that the supplement is effective in increasing life expectancy.

This means that this supplement can now be used in a general population to produce similar results.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources