# Use residual plots to diagnose heteroskedasticity

Becca Weng - February 10th, 2023

There are many ways to check that there is constant variance of errors across values of the X variables in a regression model. If you’re looking for a statistical test, you can use White’s test or the Breusch-Pagan test, both of which can be implemented using statsmodels. This post will go over a visual way to check for heteroskedasticity using residual plots after you’ve built your linear regression model.

You can access the code in the statsmodels canvas below, or read on for an in-depth, line-by-line explanation.

## Set up: fit a linear regression model

import statsmodels.api as sm

# Create X and y dataframes
X = df[["petal_width"]]
y = df[["petal_length"]]

# Add constant according to statsmodels documentation

# Create model, fit, and print results
mod_sm = sm.OLS(y,X_sm)
res_sm = mod_sm.fit()

Now that the results are saved as res_sm, we can plot the fitted values against the residuals.

## Residual plot: fitted values vs. residuals using matplotlib

import matplotlib.pyplot as plt

# Plot fitted values vs. residuals to test for heteroskedasticity
plt.scatter(res_sm.fittedvalues, res_sm.resid)
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.axhline(y = 0, color = 'r')
plt.show()

Output:

## Identifying heteroskedasticity

From the plot, we can see that the residuals seem evenly distributed for each fitted value. When examining this kind of plot, we’re looking for any distinct, observable patterns among the residuals. For example, if the residuals increase or decrease systematically as the fitted values increase, this may indicate that the model is missing some important linear or nonlinear relationship in the data.

A cone-like shape on the left shows that variance of the residuals increases as our X variable increases, indicating non-constant variance or heteroskedasticity. The random scattering of points on the right shows that the variance of the residuals is constant across values of the X variable. As our residual plot generated earlier resembles more of the plot on the right--a random cloud of data points, with no discernible pattern--we can move forward with our regression analysis.

## Alternative plotting functions

If you want to generate a few regression plots, including the one we created manually above, you can use the sm.graphics.plot_regress_exog() function.

import matplotlib.pyplot as plt
fig = plt.figure(figsize = (8,6))

# Create regression plots for specified X variable
sm.graphics.plot_regress_exog(res_sm, 'petal_width', fig = fig)
plt.show()

Output:

The function takes in a fitted linear regression model, a named X variable (i.e. 'petal_width'), and a figure object, and produces 4 plots. From top left, going clockwise:

1. Fitted values vs. chosen X variable, including confidence intervals of each prediction
2. Residuals vs. chosen X variable, helps to detect heteroskedasticity
3. Component-Component Plus Residual (CCPR) plot, accounts for the effects of other X variables in the model, when examining the relationship between the y variable and the chosen X variable.
4. Partial regression plot examines relationship between the y variable and chosen X variable, when all other X variables are held constant

This function is even more useful for multiple linear regression models involving several X variables, in which you want to isolate the effects of one variable at a time.

Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.

## Start using Einblick

Pull all your data sources together, and build actionable insights on a single unified platform.

• All connectors
• Unlimited teammates
• All operators
###### Company
Website Data Collection