Ordinary least squares (OLS) is one of the classic regression techniques for a reason–the results are highly interpretable compared to more complex and opaque models like support vector machines or neural nets. But the model needs to fit certain assumptions in order for the results to be reliable. For example, we assume the variance of the errors is constant across values of the X variables–we assume no heteroskedasticity is present. This post will cover how to run the Breusch-Pagan test for heteroskedasticity using the statsmodels
package.
The canvas has the full code, or read on to learn about the Breusch-Pagan test.
Breusch-Pagan test for heteroskedasticity example
In the below example, we’re running the test based on a simple linear regression model using the iris
dataset. The results were saved as res_sm
.
import statsmodels.api as sm
# Create model, fit, and print results
mod_sm = sm.OLS(y,X_sm)
res_sm = mod_sm.fit()
# Test for heteroscedasticity using the Breusch-Pagan test
# NOTE: statsmodels refers to X variables as `exog` for exogenous
bp_lm, bp_lm_pvalue, bp_fvalue, bp_f_pvalue = sm.stats.diagnostic.het_breuschpagan(res_sm.resid, res_sm.model.exog)
print("Lagrange multiplier statistic: " + str(bp_lm))
print("Lagrange multiplier p-value: " + str(bp_lm_pvalue))
print("F-statistic: " + str(bp_fvalue))
print("P-value of F-statistic: " + str(bp_f_pvalue))
# If the p-value is less than the chosen significance level (e.g. 0.05),
# reject the null hypothesis of homoscedasticity
if bp_lm_pvalue < 0.05:
print("Heteroscedasticity detected")
else:
print("No heteroscedasticity detected")
Output:
Lagrange multiplier statistic: 1.810901564212386
Lagrange multiplier p-value: 0.17840011661704303
F-statistic: 1.803795420618222
P-value of F-statistic: 0.18557060288418153
No heteroscedasticity detected
Interpreting the results
The p-value of our test statistic is stored as bp_lm_pvalue
. If bp_lm_pvalue
is less than 0.05, then we can reject the null hypothesis that there is no heteroskedasticity. If bp_lm_pvalue
is greater than or equal to 0.05, then we fail to reject the null hypothesis. Thus the test has not detected any heteroskedasticity.
sm.stats.diagnostic.het_breuschpagan syntax
sm.stats.diagnostic.het_breuschpagan(res_sm.resid, res_sm.model.exog)
There are two main arguments that the function takes:
resid
: the residuals of the linear regression model. We called theresid
attribute of the OLS object that was returned from fitting the OLS model.exog_het
: the X variables in the model. We used theexog
attribute of the OLS object. Note that in statsmodels, X variables are referred to as exogenous variables.
Breusch-Pagan test assumptions
The default version of the Breusch-Pagan test implemented by statsmodels
is the Koenker version, which assumes independent and identically distributed error terms. However, in the original version of the Breusch-Pagan test, published in 1979, the Breusch and Pagan assumed residuals were normally distributed. You can adjust the test as needed using the third argument in the het_breuschpagan()
function:
robust
: default isTrue
, indicating the Koenker version of the test. If set toFalse
, indicates the original Breusch-Pagan test
What next?
You might also want to look at the White test. The Breusch-Pagan test and the White test are two statistical tests that can be used to test for heteroscedasticity in a regression model. Both test the null hypothesis that the variance of the residuals is constant, and if the p-value of the test is less than the chosen significance level, you can reject the null hypothesis and conclude that there is heteroscedasticity.
If the assumption of homoscedasticity is violated, you should consider weighted least squares (WLS) regression to account for non-constant variance in the error term.
About
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.