There are many different kinds of visualizations that are most effective for conveying different kinds of information. Box plots or box-and-whisker plots are particularly useful in comparing distributions of continuous variables across groups, and identifying outliers. In this post, we’ll use seaborn’s
boxplot() function to create and customize different box plots.
Open and fork the canvas below for all example boxplots. We’ve used a subset of an Olympics dataset found on Kaggle, and will use it to examine the distribution of heights among athletes participating in different sports.
Import and setup
import seaborn as sns sns.set_theme()
We've loaded in our dataset as a
Basic box plot:
The most basic box plot in seaborn will show the distribution of one continuous variable. You only need one argument:
y. The variable you choose will alter the orientation of the box plot--whether it is horizontal or vertical.
NOTE: Alternatively you can use the
data argument to specify where the entire dataset is stored, and then use the column name as a string to specify the value of
sns.boxplot(x = df["Height"])
sns.boxplot(y = df["Height"])
Advanced box plots: comparing groups
If you would like to compare the distributions of certain continuous variables across different categories of data, you can use a combination of the
hue arguments. While using just
y, where one is continuous and the other is categorical, is sufficient, if you would like a legend, it is easiest to also set the
hue argument equal to the categorical variable.
import matplotlib.pyplot as plt sns.boxplot(data = df, x = "Height", y = "Sport", hue = "Sport", dodge = False) # Adjust legend placement plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0)
If you would like to compare groups within groups, you can set
hue to a different categorical variable than the one already present in the
y dimensions, as below.
sns.boxplot(data = df, x = "Height", y="Sport", hue = "Sex")
BONUS: seaborn boxplots with Generative AI
If you want to visualize your data faster than ever, check out our AI agent, Einblick Prompt, which can create complex, beautiful charts from as little as one sentence. In the below canvas, we used generative AI to build a comparable set of boxplots. Check out how we did it below:
Using Generative AI in Einblick
- Open and fork the canvas
- Connect to your data
- Right-click anywhere in the canvas > Prompt
- Type in: "Use the seaborn library to plot boxplots, x = Height, y = Sport, only include Gymnastics, Rowing, Swimming, and Cycling."
- Run the code in Einblick's data notebook immediately
If you try out Prompt, let us know what you think!
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.