Use seaborn histplot() to create beautiful histograms

Einblick Content Team - March 16th, 2023

Histograms are a key visualization tool that help show the distribution of numerical data. Some histograms are easier than others to customize. For example, you may have struggled with matplotlib histograms in the past–the library on which seaborn is built. This post will go over some of the many ways you can use seaborn’s histplot() function to create highly tuned and beautiful histograms.

The data comes from a Kaggle dataset on Olympic athletes from 1896 to 2016. We’ll be creating histograms visualizing their height distributions. Jump to different examples using the table of contents on the left.

Import packages and load data

import seaborn as sns
sns.set_theme()

We’ve already pre-loaded our dataset into Einblick using the upload CSV functionality, and the data is saved as a pandas DataFrame called df.

Basic histogram: sns.histplot(data, x or y, discrete, bins)

Examples 1 and 2: sns.histplot(data, x or y)

Your most basic seaborn histogram relies on two arguments:

  • data: a variable, in this case `df` where your data is stored
  • x or y: the column name that's storing the numerical variable we're counting. Whether you use x or y will simply determine the orientation of the bars.
# Example 1
sns.histplot(data = df, x = "Height")

Output:

Seaborn histplot() exampleSeaborn histplot() example
# Example 2
sns.histplot(data = df, y = "Height")

Output:

Seaborn histplot() horizontal bars exampleSeaborn histplot() horizontal bars example

Example 3: custom binning

If, as in the example above, the default binning is unsatisfactory, you can use the bins argument to determine exactly how you want to bin the data. You can either enter an integer, n, to specify the number of bins you want, or you can enter a list of cutoff points for the bins, for example [0, 100, 300, 450, 600]. Note that the cutoff points do not need to be evenly spaced.

# Example 3
sns.histplot(data = df, x = "Height", bins = 10)

Output:

Seaborn histplot custom bins exampleSeaborn histplot custom bins example

Example 4: discrete

In our example, our variable, Height, is measured in centimeters, but no fractional amounts were taken. This resulted in the gaps we can see in examples 1 and 2. But aesthetically, we can alter this using the discrete = True argument, which will center each bar and prevent gaps.

# Example 4
sns.histplot(data = df, x = "Height", discrete = True)

Output:

Seaborn histplot() discrete exampleSeaborn histplot() discrete example

Example 5: kernel density estimate (kde)

If you want to plot a kernel density estimate, which estimates the probability density function on a finite dataset, you can use the kde = True argument.

# Example 5
df["Height"] = df["Height"].astype(float)

sns.histplot(data = df, x = "Height", discrete = True, kde = True)

Output:

Seaborn histplot() kernel density estimate (KDE) exampleSeaborn histplot() kernel density estimate (KDE) example

Advanced plots with sns.histplot(): hue, x AND y, multiple, stat

Example 6: comparing groups with hue

If you want to compare the distribution of a variable across multiple groups, you can use the hue argument to do so. Simply set hue = "Sport", where "Sport" is the column in the dataset, df, containing the group labels.

# Example 6
sns.histplot(data = df, x = "Height", hue = "Sport")

Output:

Seaborn histplot with hue as groups exampleSeaborn histplot with hue as groups example

BONUS: sns.histplot() with Generative AI

We recently launched an AI agent, Einblick Prompt, that can reason alongside you, the programmer, to build out entire workflows with natural language prompts (pun-intended). Reap the benefits of generative AI directly in our AI-native data notebooks. No copy-pasting or context-switching required. Check out how we recreated a similar graph using generative AI:

Using Generative AI in Einblick

  1. Open the canvas
  2. Fork the canvas
  3. Right-click anywhere in the canvas > Prompt
  4. Type in: "Use seaborn to plot a histogram of height, color by sport, only include Gymnastics, Swimming, and Cycling."
  5. Run the code in Einblick's data notebook immediately

Test out different prompts, and see what different charts and graphs you can create instantly.

Example 7: side-by-side bars (multiple = "dodge")

As in the example above, you can see that by default, when plotting multiple groups, the bars overlap (multiple = "layer"). Sometimes, however, you want to plot the bars side-by-side. You can do this by setting the argument multiple = "dodge". The two other options available are "stack" and "fill".

# Example 7
sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge")

Output:

Seaborn histplot() dodge bars exampleSeaborn histplot() dodge bars example

Example 8: different aggregate statistics (stat)

If you want to check the distribution of the variable according to a different aggregate statistic, you can do so using the stat argument. The options are count, frequency, probability or proportion, percent, and density.

# Example 8
sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge", stat = "probability")

Output:

Seaborn histplot, probability exampleSeaborn histplot, probability example

BONUS: creating a heat map using sns.histplot(data, x, y)

Although there are other heatmap functions available in Python, you can actually create one using the x AND y variables together in the seaborn histplot() function.

# Make sure the columns are of compatible type
df["Weight"] = df["Weight"].astype(float)
df["Height"] = df["Height"].astype(float)

# Example 9
sns.histplot(data = df, x = "Height", y = "Weight")

Output:

Seaborn histplot() heatmap exampleSeaborn histplot() heatmap example

BONUS: create multiple color maps

Lastly, you can compare distributions of groups by creating a color map. This has a similar effect to box plots side-by-side. If these interest you, consider them next time!

df["Height"] = df["Height"].astype(float)

# Example 10
sns.histplot(data = df, x = "Height", y = "Sport", hue = "Sport", legend = False)

Output:

Seaborn histplot() multiple color maps exampleSeaborn histplot() multiple color maps example

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.