Ordinary least squares (OLS) is one of the classic regression techniques for a reason–the results are highly interpretable compared to more complex and opaque models like support vector machines or neural nets. But the model needs to fit certain assumptions in order for the results to be reliable. For example, we assume the variance of the errors is constant across values of the X variables–we assume no heteroskedasticity is present. This post will cover how to run the Breusch-Pagan test for heteroskedasticity using the
The canvas has the full code, or read on to learn about the Breusch-Pagan test.
Breusch-Pagan test for heteroskedasticity example
In the below example, we’re running the test based on a simple linear regression model using the
iris dataset. The results were saved as
import statsmodels.api as sm # Create model, fit, and print results mod_sm = sm.OLS(y,X_sm) res_sm = mod_sm.fit() # Test for heteroscedasticity using the Breusch-Pagan test # NOTE: statsmodels refers to X variables as `exog` for exogenous bp_lm, bp_lm_pvalue, bp_fvalue, bp_f_pvalue = sm.stats.diagnostic.het_breuschpagan(res_sm.resid, res_sm.model.exog) print("Lagrange multiplier statistic: " + str(bp_lm)) print("Lagrange multiplier p-value: " + str(bp_lm_pvalue)) print("F-statistic: " + str(bp_fvalue)) print("P-value of F-statistic: " + str(bp_f_pvalue)) # If the p-value is less than the chosen significance level (e.g. 0.05), # reject the null hypothesis of homoscedasticity if bp_lm_pvalue < 0.05: print("Heteroscedasticity detected") else: print("No heteroscedasticity detected")
Lagrange multiplier statistic: 1.810901564212386 Lagrange multiplier p-value: 0.17840011661704303 F-statistic: 1.803795420618222 P-value of F-statistic: 0.18557060288418153 No heteroscedasticity detected
Interpreting the results
The p-value of our test statistic is stored as
bp_lm_pvalue is less than 0.05, then we can reject the null hypothesis that there is no heteroskedasticity. If
bp_lm_pvalue is greater than or equal to 0.05, then we fail to reject the null hypothesis. Thus the test has not detected any heteroskedasticity.
There are two main arguments that the function takes:
resid: the residuals of the linear regression model. We called the
residattribute of the OLS object that was returned from fitting the OLS model.
exog_het: the X variables in the model. We used the
exogattribute of the OLS object. Note that in statsmodels, X variables are referred to as exogenous variables.
Breusch-Pagan test assumptions
The default version of the Breusch-Pagan test implemented by
statsmodels is the Koenker version, which assumes independent and identically distributed error terms. However, in the original version of the Breusch-Pagan test, published in 1979, the Breusch and Pagan assumed residuals were normally distributed. You can adjust the test as needed using the third argument in the
robust: default is
True, indicating the Koenker version of the test. If set to
False, indicates the original Breusch-Pagan test
You might also want to look at the White test. The Breusch-Pagan test and the White test are two statistical tests that can be used to test for heteroscedasticity in a regression model. Both test the null hypothesis that the variance of the residuals is constant, and if the p-value of the test is less than the chosen significance level, you can reject the null hypothesis and conclude that there is heteroscedasticity.
If the assumption of homoscedasticity is violated, you should consider weighted least squares (WLS) regression to account for non-constant variance in the error term.
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.