Testing for heteroskedasticity (with a "k" or "c") is essential when running various regression models. For example, one of the main assumptions of OLS is that there is constant variance (homoscedasticity) among the residuals or errors of your linear regression model. There are several tests for heteroskedasticity.
# Test for heteroskedasticity using the White test # NOTE: statsmodels refers to X variables as `exog` for exogenous lm, lm_pvalue, fvalue, f_pvalue = sm.stats.diagnostic.het_white(res_sm.resid, res_sm.model.exog) print("Lagrange multiplier statistic: " + str(lm)) print("Lagrange multiplier p-value: " + str(lm_pvalue)) print("F-statistic: " + str(fvalue)) print("P-value of F-statistic: " + str(f_pvalue)) # If the p-value is less than the chosen significance level (e.g. 0.05), # reject the null hypothesis of homoscedasticity if lm_pvalue < 0.05: print("Rejected the null hypothesis. Heteroskedasticity detected") else: print("Failed to reject the null hypothesis. No heteroskedasticity detected.")
Lagrange multiplier statistic: 1.8620893087293255 Lagrange multiplier p-value: 0.3941417533297926 F-statistic: 0.9090361032863525 P-value of F-statistic: 0.40987833577566857 Failed to reject the null hypothesis. No heteroskedasticity detected.
Interpreting test results
As with any statistical test, you need to define your null and alternative hypotheses:
statsmodels documentation, we can see that the p-value of our test statistic is stored as
lm_pvalue. So, if
lm_pvalue is less than 0.05, then we can reject the null hypothesis that the residuals are uniformly scattered. If
lm_pvalue is large, then we fail to reject the null hypothesis. Thus the test has not detected any heteroskedasticity.
In this case we have not detected heteroskedasticity, which likely means we can move forward with trusting the results of our linear regression model. In the event that you DO reject the null hypothesis--meaning that the residuals were not uniformly scattered--then you would need to revisit your model.
Some common approaches to handling heteroskedastic data include:
- Transforming your
yor dependent variable. If using
statsmodels, this may also be referred to as the endogenous variable or
- Using a different kind of regression that is less sensitive to heteroskedastic data, such as using weighted least squares regression
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.