There are many different kinds of visualizations that are most effective for conveying different kinds of information. Box plots or box-and-whisker plots are particularly useful in comparing distributions of continuous variables across groups, and identifying outliers. In this post, we’ll use seaborn’s
boxplot() function to create and customize different box plots.
Open and fork the canvas below for all example boxplots. We’ve used a subset of an Olympics dataset found on Kaggle, and will use it to examine the distribution of heights among athletes participating in different sports.
Import and setup
import seaborn as sns sns.set_theme()
We've loaded in our dataset as a
Basic box plot:
The most basic box plot in seaborn will show the distribution of one continuous variable. You only need one argument:
y. The variable you choose will alter the orientation of the box plot--whether it is horizontal or vertical.
NOTE: Alternatively you can use the
data argument to specify where the entire dataset is stored, and then use the column name as a string to specify the value of
sns.boxplot(x = df["Height"])
sns.boxplot(y = df["Height"])
Advanced box plots: comparing groups
Comparing groups: Example 1
If you would like to compare the distributions of certain continuous variables across different categories of data, you can use a combination of the
hue arguments. While using just
y, where one is continuous and the other is categorical, is sufficient, if you would like a legend, it is easiest to also set the
hue argument equal to the categorical variable.
import matplotlib.pyplot as plt sns.boxplot(data = df, x = "Height", y = "Sport", hue = "Sport", dodge = False) # Adjust legend placement plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0)
If you would like to compare groups within groups, you can set
hue to a different categorical variable than the one already present in the
y dimensions, as below.
sns.boxplot(data = df, x = "Height", y="Sport", hue = "Sex")
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.