Histograms are a key visualization tool that help show the distribution of numerical data. Some histograms are easier than others to customize. For example, you may have struggled with matplotlib
histograms in the past–the library on which seaborn
is built. This post will go over some of the many ways you can use seaborn
’s histplot()
function to create highly tuned and beautiful histograms.
The data comes from a Kaggle dataset on Olympic athletes from 1896 to 2016. We’ll be creating histograms visualizing their height distributions. Jump to different examples using the table of contents on the left.
Import packages and load data
import seaborn as sns
sns.set_theme()
We’ve already pre-loaded our dataset into Einblick using the upload CSV functionality, and the data is saved as a pandas DataFrame
called df
.
sns.histplot(data, x or y, discrete, bins)
Basic histogram: sns.histplot(data, x or y)
Examples 1 and 2: Your most basic seaborn
histogram relies on two arguments:
data
: a variable, in this case `df` where your data is storedx
ory
: the column name that's storing the numerical variable we're counting. Whether you usex
ory
will simply determine the orientation of the bars.
# Example 1
sns.histplot(data = df, x = "Height")
Output:

# Example 2
sns.histplot(data = df, y = "Height")
Output:

Example 3: custom binning
If, as in the example above, the default binning is unsatisfactory, you can use the bins
argument to determine exactly how you want to bin the data. You can either enter an integer, n
, to specify the number of bins you want, or you can enter a list of cutoff points for the bins, for example [0, 100, 300, 450, 600]
. Note that the cutoff points do not need to be evenly spaced.
# Example 3
sns.histplot(data = df, x = "Height", bins = 10)
Output:

discrete
Example 4: In our example, our variable, Height
, is measured in centimeters, but no fractional amounts were taken. This resulted in the gaps we can see in examples 1 and 2. But aesthetically, we can alter this using the discrete = True
argument, which will center each bar and prevent gaps.
# Example 4
sns.histplot(data = df, x = "Height", discrete = True)
Output:

kde
)
Example 5: kernel density estimate (If you want to plot a kernel density estimate, which estimates the probability density function on a finite dataset, you can use the kde = True
argument.
# Example 5
df["Height"] = df["Height"].astype(float)
sns.histplot(data = df, x = "Height", discrete = True, kde = True)
Output:

sns.histplot()
: hue, x AND y, multiple, stat
Advanced plots with hue
Example 6: comparing groups with If you want to compare the distribution of a variable across multiple groups, you can use the hue
argument to do so. Simply set hue = "Sport"
, where "Sport"
is the column in the dataset, df
, containing the group labels.
# Example 6
sns.histplot(data = df, x = "Height", hue = "Sport")
Output:

sns.histplot()
with Generative AI
BONUS: We recently launched an AI agent, Einblick Prompt, that can reason alongside you, the programmer, to build out entire workflows with natural language prompts (pun-intended). Reap the benefits of generative AI directly in our AI-native data notebooks. No copy-pasting or context-switching required. Check out how we recreated a similar graph using generative AI:
Using Generative AI in Einblick
- Open the canvas
- Fork the canvas
- Right-click anywhere in the canvas > Prompt
- Type in: "Use seaborn to plot a histogram of height, color by sport, only include Gymnastics, Swimming, and Cycling."
- Run the code in Einblick's data notebook immediately
Test out different prompts, and see what different charts and graphs you can create instantly.
multiple = "dodge"
)
Example 7: side-by-side bars (As in the example above, you can see that by default, when plotting multiple groups, the bars overlap (multiple = "layer"
). Sometimes, however, you want to plot the bars side-by-side. You can do this by setting the argument multiple = "dodge"
. The two other options available are "stack"
and "fill"
.
# Example 7
sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge")
Output:

stat
)
Example 8: different aggregate statistics (If you want to check the distribution of the variable according to a different aggregate statistic, you can do so using the stat
argument. The options are count
, frequency
, probability
or proportion
, percent
, and density
.
# Example 8
sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge", stat = "probability")
Output:

sns.histplot(data, x, y)
BONUS: creating a heat map using Although there are other heatmap functions available in Python, you can actually create one using the x
AND y
variables together in the seaborn
histplot()
function.
# Make sure the columns are of compatible type
df["Weight"] = df["Weight"].astype(float)
df["Height"] = df["Height"].astype(float)
# Example 9
sns.histplot(data = df, x = "Height", y = "Weight")
Output:

BONUS: create multiple color maps
Lastly, you can compare distributions of groups by creating a color map. This has a similar effect to box plots side-by-side. If these interest you, consider them next time!
df["Height"] = df["Height"].astype(float)
# Example 10
sns.histplot(data = df, x = "Height", y = "Sport", hue = "Sport", legend = False)
Output:

About
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.