How to create a histogram using matplolib

Einblick Content Team - March 15th, 2023

Learn how to create a histogram in matplotlib, a powerful data visualization library in Python that backs other libraries, such as seaborn. Check out the following 8 examples of matplotlib histograms, with different levels of customization. Use the table of contents on the left to navigate through the post.

The data we've used for this example includes the heights and weights of Summer and Winter Olympic athletes from 1896 to 2016. Open and fork the canvas below for the full code.

Basic histogram: plt.hist(data, x, bins)

The following two examples create the same basic histogram. The plt.hist() function can take in an array of data as x, or it can take in a dataset as a DataFrame or another object as data with column names, which can then be referenced using a string fed into the argument x.

Example 1: plt.hist(x)

import matplotlib.pyplot as plt

# Basic histogram 1
plt.hist(x = df["Height"])
plt.show()

Example 2: plt.hist(data, x)

# Basic histogram 2
plt.hist(data = df, x = "Height")
plt.show()

Output:

Matplotlib basic histogram exampleMatplotlib basic histogram example

Example 3: plt.hist(bins = n)

If you want to adjust the binning of a histogram, you can set the argument bins equal to the number of bins you would like in your chart.

# Histogram 3, bins
plt.hist(x = df["Height"], bins = 50)
plt.show()

Output:

Matplotlib histogram n bins exampleMatplotlib histogram n bins example

Example 4: plt.hist(bins = lst)

Alternatively, you can provide a list of values, which will determine the cutoff points for each bin. Note that the last bin is inclusive. In the example below, based on the list bins = [150, 200, 250, 300], the bins are as follows: [150, 200), [200, 250), and [250, 300], where the last bin includes both 250 and 300.

bins = [150, 200, 250, 300]

# Histogram 4, bins
plt.hist(x = df["Height"], bins = bins)
plt.show()

Output:

Matplotlib histogram with list of bins exampleMatplotlib histogram with list of bins example

Advanced histograms: multiple datasets, color, labels, orientation, histtype

Example 5: histogram with multiple datasets (plt.hist(x = [var1, var2]))

If you have two variables that you would like to plot on the same histogram, you can do so by passing in a list to the the argument x, as in the example below. If you do so, you may also want to color the bars differently, and create a legend using the following arguments:

  • color: takes in a list or array of colors
  • label: takes in a list or array of labels
  • plt.legend(): adds in a legend for the plot
# Histogram with 2 datasets
plt.hist(x = [df["Height"], df["Weight"]], bins = 25, color = ["lightskyblue", "lightgreen"], label = ["Height", "Weight"])
plt.legend(loc = "upper left")
plt.show()

Output:

Matplotlib histogram multiple datasets exampleMatplotlib histogram multiple datasets example

Example 6: histogram, horizontal orientation

In addition to the variables we've used in previous examples, for this one, we've used the orientation = "horizontal" option to change the layout of the histogram entirely.

# Histogram with 2 datasets
plt.hist(x = [df["Height"], df["Weight"]], bins = 25, color = ["lightskyblue", "lightgreen"], label = ["Height", "Weight"], orientation = "horizontal")
plt.legend(loc = "upper right")
plt.show()

Output:

Matplotlib histogram horizontal orientationMatplotlib histogram horizontal orientation

Example 7: histogram colored by group

In the following example, we've created a grouped histogram by creating two separate sets of data, one for the heights of summer athletes, and one for the heights of winter athletes. We have then used arguments previously used to create the following chart.

Note: bins that overlap for winter and summer athletes are displayed side-by-side by default.

# Histogram grouped by category
summer = df[df["Season"] == "Summer"]["Height"]
winter = df[df["Season"] == "Winter"]["Height"]

plt.hist(x = [summer, winter], bins = 25, color = ["gold", "lightskyblue"], label = ["Summer Olympics", "Winter Olympics"])
plt.legend(loc = "upper left")
plt.show()

Output:

Matplotlib histogram with groups, colors, exampleMatplotlib histogram with groups, colors, example

Example 8: different histogram types

Lastly, if you are unsatisfied with bars, the plt.hist() function comes with a few prepackaged types that you can use:

  • histtype: type of histogram, options are 'bar', 'barstacked', 'step', and 'stepfilled'
# Histogram grouped by category
summer = df[df["Season"] == "Summer"]["Height"]
winter = df[df["Season"] == "Winter"]["Height"]

plt.hist(x = [summer, winter], bins = 25, color = ["gold", "lightskyblue"], label = ["Summer Olympics", "Winter Olympics"], histtype = "step")
plt.legend(loc = "upper left")
plt.show()

Output:

Matplotlib histogram step style exampleMatplotlib histogram step style example

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.