Ordinary least squares (OLS) is one of the classic regression techniques for a reason–the results are highly interpretable compared to more complex and opaque models like support vector machines or neural nets. But the model needs to fit certain assumptions in order for the results to be reliable. For example, we assume the variance of the errors is constant across values of the X variables–we assume no heteroskedasticity is present. This post will cover how to run the Breusch-Pagan test for heteroskedasticity using the `statsmodels`

package.

The canvas has the full code, or read on to learn about the Breusch-Pagan test.

## Breusch-Pagan test for heteroskedasticity example

In the below example, we’re running the test based on a simple linear regression model using the `iris`

dataset. The results were saved as `res_sm`

.

```
import statsmodels.api as sm
# Create model, fit, and print results
mod_sm = sm.OLS(y,X_sm)
res_sm = mod_sm.fit()
# Test for heteroscedasticity using the Breusch-Pagan test
# NOTE: statsmodels refers to X variables as `exog` for exogenous
bp_lm, bp_lm_pvalue, bp_fvalue, bp_f_pvalue = sm.stats.diagnostic.het_breuschpagan(res_sm.resid, res_sm.model.exog)
print("Lagrange multiplier statistic: " + str(bp_lm))
print("Lagrange multiplier p-value: " + str(bp_lm_pvalue))
print("F-statistic: " + str(bp_fvalue))
print("P-value of F-statistic: " + str(bp_f_pvalue))
# If the p-value is less than the chosen significance level (e.g. 0.05),
# reject the null hypothesis of homoscedasticity
if bp_lm_pvalue < 0.05:
print("Heteroscedasticity detected")
else:
print("No heteroscedasticity detected")
```

**Output:**

```
Lagrange multiplier statistic: 1.810901564212386
Lagrange multiplier p-value: 0.17840011661704303
F-statistic: 1.803795420618222
P-value of F-statistic: 0.18557060288418153
No heteroscedasticity detected
```

## Interpreting the results

The p-value of our test statistic is stored as `bp_lm_pvalue`

. If `bp_lm_pvalue`

is less than 0.05, then we can reject the null hypothesis that there is no heteroskedasticity. If `bp_lm_pvalue`

is greater than or equal to 0.05, then we fail to reject the null hypothesis. Thus the test has not detected any heteroskedasticity.

## sm.stats.diagnostic.het_breuschpagan syntax

`sm.stats.diagnostic.het_breuschpagan(res_sm.resid, res_sm.model.exog)`

There are two main arguments that the function takes:

`resid`

: the residuals of the linear regression model. We called the`resid`

attribute of the OLS object that was returned from fitting the OLS model.`exog_het`

: the X variables in the model. We used the`exog`

attribute of the OLS object. Note that in statsmodels, X variables are referred to as exogenous variables.

## Breusch-Pagan test assumptions

The default version of the Breusch-Pagan test implemented by `statsmodels`

is the Koenker version, which assumes independent and identically distributed error terms. However, in the original version of the Breusch-Pagan test, published in 1979, the Breusch and Pagan assumed residuals were normally distributed. You can adjust the test as needed using the third argument in the `het_breuschpagan()`

function:

`robust`

: default is`True`

, indicating the Koenker version of the test. If set to`False`

, indicates the original Breusch-Pagan test

## What next?

You might also want to look at the White test. The Breusch-Pagan test and the White test are two statistical tests that can be used to test for heteroscedasticity in a regression model. Both test the null hypothesis that the variance of the residuals is constant, and if the p-value of the test is less than the chosen significance level, you can reject the null hypothesis and conclude that there is heteroscedasticity.

If the assumption of homoscedasticity is violated, you should consider weighted least squares (WLS) regression to account for non-constant variance in the error term.

### About

Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.