A one-sample t-test is a statistical test used to compare a single sample of data with a known or hypothesized mean value. It determines whether the sample data deviates significantly from the theoretical population mean. In the below example, we’re examining shoe sale data collected from Adidas retailers, with a focus on the operating margin. We’ll go over the basic syntax for using the
ttest_1samp() function from SciPy, and some further information about t-tests. If you want to learn more about how to run a two-sample t-test in SciPy, check out our other Python code post.
To run a one-sample t-test, we first need to state our null and alternative hypotheses, as with any hypothesis test.
We’re testing whether the sample data has a population mean of 0.4 or not. To restate this within the context of our data, we’re testing whether or not the mean of the operating margin of sales is 0.4 or not.
The two arguments we need to run
a: the sample data
popmean: the theoretical population mean against which we’re testing
from scipy import stats stats.ttest_1samp(a = df["Operating Margin"], popmean = 0.40)
TtestResult(statistic=6.0474151057026955, pvalue=1.704807082569083e-09, df=2375)
We can see that the test yielded a t-statistic of 6.047, and a p-value of 1.7e-9. If we assume an alpha value of 0.05, since the p-value is much less than 0.05, we can reject the null hypothesis that the sample mean is equal to 0.4.
More technical information about t-tests
In the above example, we ran one of two kinds of one-sample t-tests. There are two types of one-sample t-tests:
- One-sided (or one-tailed) one-sample t-test
- Two-sided (or two-tailed) one-sample t-test
One-sample two-tailed t-test
The naming references how many “sides” or “tails” of the distribution that we care about. In the case of a two-sided or two-tailed one-sample t-test, as we just ran above, we divide the 5% significance level between both tails or sides of the distribution, as pictured below.
One-sample one-tailed t-test
But in the case of a one-sided or one-tailed one-sample t-test, we assume that the 5% significance level is all in one tail, as seen below:
So for one-tailed one-sample t-tests, we can test the alternative hypothesis that the mean of the sample data is greater than a theoretical value or less than a theoretical value. Let’s take a look at an example using the same data.
Running a one-tailed one-sample t-test
First, we need to set up our null and alternative hypotheses:
Then we can run the corresponding code, which leverages the
alternative argument. We can set
alternative equal to
‘greater’ to represent the alternative hypothesis we’re testing.
- The default value is
alternative = ‘less’, this means we’re testing the alternative hypothesis that the mean of the distribution underlying the sample is less than the theoretical population mean.
alternative = ‘greater’, this means we’re testing the alternative hypothesis that the mean of the distribution underlying the sample is greater than the theoretical population mean.
stats.ttest_1samp(a = df["Operating Margin"], popmean = 0.40, alternative = "less")
TtestResult(statistic=6.0474151057026955, pvalue=0.9999999991475965, df=2375)
The test yielded a t-statistic of 6.047 with a p-value of 0.99. In this case, we fail to reject the null hypothesis that the mean operating margin is less than 0.4.
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.