Histograms are a key visualization tool that help show the distribution of numerical data. Some histograms are easier than others to customize. For example, you may have struggled with
matplotlib histograms in the past–the library on which
seaborn is built. This post will go over some of the many ways you can use
histplot() function to create highly tuned and beautiful histograms.
The data comes from a Kaggle dataset on Olympic athletes from 1896 to 2016. We’ll be creating histograms visualizing their height distributions. Jump to different examples using the table of contents on the left.
Import packages and load data
import seaborn as sns sns.set_theme()
We’ve already pre-loaded our dataset into Einblick using the upload CSV functionality, and the data is saved as a
pandas DataFrame called
sns.histplot(data, x or y, discrete, bins)
Examples 1 and 2:
sns.histplot(data, x or y)
Your most basic
seaborn histogram relies on two arguments:
data: a variable, in this case `df` where your data is stored
y: the column name that's storing the numerical variable we're counting. Whether you use
ywill simply determine the orientation of the bars.
# Example 1 sns.histplot(data = df, x = "Height")
# Example 2 sns.histplot(data = df, y = "Height")
Example 3: custom binning
If, as in the example above, the default binning is unsatisfactory, you can use the
bins argument to determine exactly how you want to bin the data. You can either enter an integer,
n, to specify the number of bins you want, or you can enter a list of cutoff points for the bins, for example
[0, 100, 300, 450, 600]. Note that the cutoff points do not need to be evenly spaced.
# Example 3 sns.histplot(data = df, x = "Height", bins = 10)
In our example, our variable,
Height, is measured in centimeters, but no fractional amounts were taken. This resulted in the gaps we can see in examples 1 and 2. But aesthetically, we can alter this using the
discrete = True argument, which will center each bar and prevent gaps.
# Example 4 sns.histplot(data = df, x = "Height", discrete = True)
Example 5: kernel density estimate (
If you want to plot a kernel density estimate, which estimates the probability density function on a finite dataset, you can use the
kde = True argument.
# Example 5 df["Height"] = df["Height"].astype(float) sns.histplot(data = df, x = "Height", discrete = True, kde = True)
Advanced plots with
sns.histplot(): hue, x AND y, multiple, stat
Example 6: comparing groups with
If you want to compare the distribution of a variable across multiple groups, you can use the
hue argument to do so. Simply set
hue = "Sport", where
"Sport" is the column in the dataset,
df, containing the group labels.
# Example 6 sns.histplot(data = df, x = "Height", hue = "Sport")
Example 7: side-by-side bars (
multiple = "dodge")
As in the example above, you can see that by default, when plotting multiple groups, the bars overlap (
multiple = "layer"). Sometimes, however, you want to plot the bars side-by-side. You can do this by setting the argument
multiple = "dodge". The two other options available are
# Example 7 sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge")
Example 8: different aggregate statistics (
If you want to check the distribution of the variable according to a different aggregate statistic, you can do so using the
stat argument. The options are
# Example 8 sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge", stat = "probability")
BONUS: creating a heat map using
sns.histplot(data, x, y)
Although there are other heatmap functions available in Python, you can actually create one using the
y variables together in the
# Make sure the columns are of compatible type df["Weight"] = df["Weight"].astype(float) df["Height"] = df["Height"].astype(float) # Example 9 sns.histplot(data = df, x = "Height", y = "Weight")
BONUS: create multiple color maps
Lastly, you can compare distributions of groups by creating a color map. This has a similar effect to box plots side-by-side. If these interest you, consider them next time!
df["Height"] = df["Height"].astype(float) # Example 10 sns.histplot(data = df, x = "Height", y = "Sport", hue = "Sport", legend = False)
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.