White's test for heteroskedasticity in Python

Einblick Content Team - January 17th, 2023

Testing for heteroskedasticity (with a "k" or "c") is essential when running various regression models. For example, one of the main assumptions of OLS is that there is constant variance (homoscedasticity) among the residuals or errors of your linear regression model. There are several tests for heteroskedasticity.

Here's an example of how you can use statsmodels to test for heteroskedasticity in a simple linear regression. Check out the yellow data zone for the code on White's test.

# Test for heteroskedasticity using the White test
# NOTE: statsmodels refers to X variables as `exog` for exogenous
lm, lm_pvalue, fvalue, f_pvalue = sm.stats.diagnostic.het_white(res_sm.resid, res_sm.model.exog)

print("Lagrange multiplier statistic: " + str(lm))
print("Lagrange multiplier p-value:   " + str(lm_pvalue))
print("F-statistic:                   " + str(fvalue))
print("P-value of F-statistic:        " + str(f_pvalue))

# If the p-value is less than the chosen significance level (e.g. 0.05), 
# reject the null hypothesis of homoscedasticity
if lm_pvalue < 0.05:
    print("Rejected the null hypothesis. Heteroskedasticity detected")
else:
    print("Failed to reject the null hypothesis. No heteroskedasticity detected.")

Output:

Lagrange multiplier statistic: 1.8620893087293255
Lagrange multiplier p-value:   0.3941417533297926
F-statistic:                   0.9090361032863525
P-value of F-statistic:        0.40987833577566857
Failed to reject the null hypothesis. No heteroskedasticity detected.

Interpreting test results

As with any statistical test, you need to define your null and alternative hypotheses:

H0:The residuals are uniformly scattered. (No heteroskedasticity)HA:The residuals are not uniformly scattered. (Heteroskedasticity detected)H_0: \text{The residuals are uniformly scattered. (No heteroskedasticity)}\\ H_A: \text{The residuals are not uniformly scattered. (Heteroskedasticity detected)}

From the statsmodels documentation, we can see that the p-value of our test statistic is stored as lm_pvalue. So, if lm_pvalue is less than 0.05, then we can reject the null hypothesis that the residuals are uniformly scattered. If lm_pvalue is large, then we fail to reject the null hypothesis. Thus the test has not detected any heteroskedasticity.

What next?

In this case we have not detected heteroskedasticity, which likely means we can move forward with trusting the results of our linear regression model. In the event that you DO reject the null hypothesis--meaning that the residuals were not uniformly scattered--then you would need to revisit your model.

Some common approaches to handling heteroskedastic data include:

  • Transforming your y or dependent variable. If using statsmodels, this may also be referred to as the endogenous variable or endog.
  • Using a different kind of regression that is less sensitive to heteroskedastic data, such as using weighted least squares regression

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.