Testing for heteroskedasticity (with a "k" or "c") is essential when running various regression models. For example, one of the main assumptions of OLS is that there is constant variance (homoscedasticity) among the residuals or errors of your linear regression model. There are several tests for heteroskedasticity.

Here's an example of how you can use `statsmodels`

to test for heteroskedasticity in a simple linear regression. Check out the yellow data zone for the code on White's test.

```
# Test for heteroskedasticity using the White test
# NOTE: statsmodels refers to X variables as `exog` for exogenous
lm, lm_pvalue, fvalue, f_pvalue = sm.stats.diagnostic.het_white(res_sm.resid, res_sm.model.exog)
print("Lagrange multiplier statistic: " + str(lm))
print("Lagrange multiplier p-value: " + str(lm_pvalue))
print("F-statistic: " + str(fvalue))
print("P-value of F-statistic: " + str(f_pvalue))
# If the p-value is less than the chosen significance level (e.g. 0.05),
# reject the null hypothesis of homoscedasticity
if lm_pvalue < 0.05:
print("Rejected the null hypothesis. Heteroskedasticity detected")
else:
print("Failed to reject the null hypothesis. No heteroskedasticity detected.")
```

**Output:**

```
Lagrange multiplier statistic: 1.8620893087293255
Lagrange multiplier p-value: 0.3941417533297926
F-statistic: 0.9090361032863525
P-value of F-statistic: 0.40987833577566857
Failed to reject the null hypothesis. No heteroskedasticity detected.
```

## Interpreting test results

As with any statistical test, you need to define your null and alternative hypotheses:

From the `statsmodels`

documentation, we can see that the p-value of our test statistic is stored as `lm_pvalue`

. So, if `lm_pvalue`

is less than 0.05, then we can reject the null hypothesis that the residuals are uniformly scattered. If `lm_pvalue`

is large, then we fail to reject the null hypothesis. Thus the test has not detected any heteroskedasticity.

## What next?

In this case we have not detected heteroskedasticity, which likely means we can move forward with trusting the results of our linear regression model. In the event that you DO reject the null hypothesis--meaning that the residuals were not uniformly scattered--then you would need to revisit your model.

Some common approaches to handling heteroskedastic data include:

- Transforming your
`y`

or dependent variable. If using`statsmodels`

, this may also be referred to as the endogenous variable or`endog`

. - Using a different kind of regression that is less sensitive to heteroskedastic data, such as using weighted least squares regression

### About

Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.