Breusch-Pagan test for heteroskedasticity in Python

Einblick Content Team - February 9th, 2023

Ordinary least squares (OLS) is one of the classic regression techniques for a reason–the results are highly interpretable compared to more complex and opaque models like support vector machines or neural nets. But the model needs to fit certain assumptions in order for the results to be reliable. For example, we assume the variance of the errors is constant across values of the X variables–we assume no heteroskedasticity is present. This post will cover how to run the Breusch-Pagan test for heteroskedasticity using the statsmodels package.

The canvas has the full code, or read on to learn about the Breusch-Pagan test.

Breusch-Pagan test for heteroskedasticity example

In the below example, we’re running the test based on a simple linear regression model using the iris dataset. The results were saved as res_sm.

import statsmodels.api as sm

# Create model, fit, and print results
mod_sm = sm.OLS(y,X_sm)
res_sm = mod_sm.fit()

# Test for heteroscedasticity using the Breusch-Pagan test
# NOTE: statsmodels refers to X variables as `exog` for exogenous
bp_lm, bp_lm_pvalue, bp_fvalue, bp_f_pvalue = sm.stats.diagnostic.het_breuschpagan(res_sm.resid, res_sm.model.exog)

print("Lagrange multiplier statistic: " + str(bp_lm))
print("Lagrange multiplier p-value:   " + str(bp_lm_pvalue))
print("F-statistic:                   " + str(bp_fvalue))
print("P-value of F-statistic:        " + str(bp_f_pvalue))

# If the p-value is less than the chosen significance level (e.g. 0.05), 
# reject the null hypothesis of homoscedasticity
if bp_lm_pvalue < 0.05:
    print("Heteroscedasticity detected")
else:
    print("No heteroscedasticity detected")

Output:

Lagrange multiplier statistic: 1.810901564212386
Lagrange multiplier p-value:   0.17840011661704303
F-statistic:                   1.803795420618222
P-value of F-statistic:        0.18557060288418153
No heteroscedasticity detected

Interpreting the results

H0:The residuals are uniformly scattered. (No heteroskedasticity)HA:The residuals are not uniformly scattered. (Heteroskedasticity detected)H_0: \text{The residuals are uniformly scattered. (No heteroskedasticity)} \newline H_A: \text{The residuals are not uniformly scattered. (Heteroskedasticity detected)}

The p-value of our test statistic is stored as bp_lm_pvalue. If bp_lm_pvalue is less than 0.05, then we can reject the null hypothesis that there is no heteroskedasticity. If bp_lm_pvalue is greater than or equal to 0.05, then we fail to reject the null hypothesis. Thus the test has not detected any heteroskedasticity.

sm.stats.diagnostic.het_breuschpagan syntax

sm.stats.diagnostic.het_breuschpagan(res_sm.resid, res_sm.model.exog)

There are two main arguments that the function takes:

  • resid: the residuals of the linear regression model. We called the resid attribute of the OLS object that was returned from fitting the OLS model.
  • exog_het: the X variables in the model. We used the exog attribute of the OLS object. Note that in statsmodels, X variables are referred to as exogenous variables.

Breusch-Pagan test assumptions

The default version of the Breusch-Pagan test implemented by statsmodels is the Koenker version, which assumes independent and identically distributed error terms. However, in the original version of the Breusch-Pagan test, published in 1979, the Breusch and Pagan assumed residuals were normally distributed. You can adjust the test as needed using the third argument in the het_breuschpagan() function:

  • robust: default is True, indicating the Koenker version of the test. If set to False, indicates the original Breusch-Pagan test

What next?

You might also want to look at the White test. The Breusch-Pagan test and the White test are two statistical tests that can be used to test for heteroscedasticity in a regression model. Both test the null hypothesis that the variance of the residuals is constant, and if the p-value of the test is less than the chosen significance level, you can reject the null hypothesis and conclude that there is heteroscedasticity.

If the assumption of homoscedasticity is violated, you should consider weighted least squares (WLS) regression to account for non-constant variance in the error term.

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.

Start using Einblick

Pull all your data sources together, and build actionable insights on a single unified platform.

  • All connectors
  • Unlimited teammates
  • All operators