How to create a histogram using matplolib

Einblick Content Team - March 15th, 2023

Learn how to create a histogram in matplotlib, a powerful data visualization library in Python that backs other libraries, such as seaborn. Check out the following 8 examples of matplotlib histograms, with different levels of customization. Use the table of contents on the left to navigate through the post.

The data we've used for this example includes the heights and weights of Summer and Winter Olympic athletes from 1896 to 2016. Open and fork the canvas below for the full code.

Basic histogram: plt.hist(data, x, bins)

The following two examples create the same basic histogram. The plt.hist() function can take in an array of data as x, or it can take in a dataset as a DataFrame or another object as data with column names, which can then be referenced using a string fed into the argument x.

Example 1: plt.hist(x)

import matplotlib.pyplot as plt

# Basic histogram 1
plt.hist(x = df["Height"])
plt.show()

Example 2: plt.hist(data, x)

# Basic histogram 2
plt.hist(data = df, x = "Height")
plt.show()

Output:

Example 3: plt.hist(bins = n)

If you want to adjust the binning of a histogram, you can set the argument bins equal to the number of bins you would like in your chart.

# Histogram 3, bins
plt.hist(x = df["Height"], bins = 50)
plt.show()

Output:

Example 4: plt.hist(bins = lst)

Alternatively, you can provide a list of values, which will determine the cutoff points for each bin. Note that the last bin is inclusive. In the example below, based on the list bins = [150, 200, 250, 300], the bins are as follows: [150, 200), [200, 250), and [250, 300], where the last bin includes both 250 and 300.

bins = [150, 200, 250, 300]

# Histogram 4, bins
plt.hist(x = df["Height"], bins = bins)
plt.show()

Output:

Advanced histograms: multiple datasets, color, labels, orientation, histtype

Example 5: histogram with multiple datasets (plt.hist(x = [var1, var2]))

If you have two variables that you would like to plot on the same histogram, you can do so by passing in a list to the the argument x, as in the example below. If you do so, you may also want to color the bars differently, and create a legend using the following arguments:

• color: takes in a list or array of colors
• label: takes in a list or array of labels
• plt.legend(): adds in a legend for the plot
# Histogram with 2 datasets
plt.hist(x = [df["Height"], df["Weight"]], bins = 25, color = ["lightskyblue", "lightgreen"], label = ["Height", "Weight"])
plt.legend(loc = "upper left")
plt.show()

Output:

Example 6: histogram, horizontal orientation

In addition to the variables we've used in previous examples, for this one, we've used the orientation = "horizontal" option to change the layout of the histogram entirely.

# Histogram with 2 datasets
plt.hist(x = [df["Height"], df["Weight"]], bins = 25, color = ["lightskyblue", "lightgreen"], label = ["Height", "Weight"], orientation = "horizontal")
plt.legend(loc = "upper right")
plt.show()

Output:

Example 7: histogram colored by group

In the following example, we've created a grouped histogram by creating two separate sets of data, one for the heights of summer athletes, and one for the heights of winter athletes. We have then used arguments previously used to create the following chart.

Note: bins that overlap for winter and summer athletes are displayed side-by-side by default.

# Histogram grouped by category
summer = df[df["Season"] == "Summer"]["Height"]
winter = df[df["Season"] == "Winter"]["Height"]

plt.hist(x = [summer, winter], bins = 25, color = ["gold", "lightskyblue"], label = ["Summer Olympics", "Winter Olympics"])
plt.legend(loc = "upper left")
plt.show()

Output:

Example 8: different histogram types

Lastly, if you are unsatisfied with bars, the plt.hist() function comes with a few prepackaged types that you can use:

• histtype: type of histogram, options are 'bar', 'barstacked', 'step', and 'stepfilled'
# Histogram grouped by category
summer = df[df["Season"] == "Summer"]["Height"]
winter = df[df["Season"] == "Winter"]["Height"]

plt.hist(x = [summer, winter], bins = 25, color = ["gold", "lightskyblue"], label = ["Summer Olympics", "Winter Olympics"], histtype = "step")
plt.legend(loc = "upper left")
plt.show()

Output: