Creating custom graphs with seaborn boxplot()

Einblick Content Team - March 17th, 2023

There are many different kinds of visualizations that are most effective for conveying different kinds of information. Box plots or box-and-whisker plots are particularly useful in comparing distributions of continuous variables across groups, and identifying outliers. In this post, we’ll use seaborn’s boxplot() function to create and customize different box plots.

Open and fork the canvas below for all example boxplots. We’ve used a subset of an Olympics dataset found on Kaggle, and will use it to examine the distribution of heights among athletes participating in different sports.

Import and setup

import seaborn as sns
sns.set_theme()

We've loaded in our dataset as a pandas DataFrame called df.

Basic box plot: sns.boxplot(x or y)

The most basic box plot in seaborn will show the distribution of one continuous variable. You only need one argument: x or y. The variable you choose will alter the orientation of the box plot--whether it is horizontal or vertical.

NOTE: Alternatively you can use the data argument to specify where the entire dataset is stored, and then use the column name as a string to specify the value of x or y.

sns.boxplot(x = df["Height"])

Output:

Seaborn boxplot exampleSeaborn boxplot example
sns.boxplot(y = df["Height"])

Output:

Seaborn boxplot vertical exampleSeaborn boxplot vertical example

Advanced box plots: comparing groups

Comparing groups: sns.boxplot(x and y, hue)

If you would like to compare the distributions of certain continuous variables across different categories of data, you can use a combination of the x, y, and hue arguments. While using just x and y, where one is continuous and the other is categorical, is sufficient, if you would like a legend, it is easiest to also set the hue argument equal to the categorical variable.

import matplotlib.pyplot as plt

sns.boxplot(data = df, x = "Height", y = "Sport", hue = "Sport", dodge = False)

# Adjust legend placement
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0)

Output:

Seaborn boxplot hue variable exampleSeaborn boxplot hue variable example

If you would like to compare groups within groups, you can set hue to a different categorical variable than the one already present in the x and y dimensions, as below.

sns.boxplot(data = df, x = "Height", y="Sport", hue = "Sex")

Output:

Seaborn boxplot compare multiple groups exampleSeaborn boxplot compare multiple groups example

BONUS: seaborn boxplots with Generative AI

If you want to visualize your data faster than ever, check out our AI agent, Einblick Prompt, which can create complex, beautiful charts from as little as one sentence. In the below canvas, we used generative AI to build a comparable set of boxplots. Check out how we did it below:

Using Generative AI in Einblick

  1. Open and fork the canvas
  2. Connect to your data
  3. Right-click anywhere in the canvas > Prompt
  4. Type in: "Use the seaborn library to plot boxplots, x = Height, y = Sport, only include Gymnastics, Rowing, Swimming, and Cycling."
  5. Run the code in Einblick's data notebook immediately

If you try out Prompt, let us know what you think!

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.