As a data scientist, you have to deal with a lot of categorical data, from product to subscription types, there are many instances when you need an intuitive plot to compare different groups. In this post, we’ll review the basics of seaborn’s countplot()
function. We’ll cover a few key arguments, including data
, x
, y
, order
, and hue
, as well as a few plot examples.
Use the table of contents on the left to jump to the most relevant plot for your use-case. Otherwise, check out the canvas below for all of the code. We’ve used a subset of a books dataset found on Kaggle.
Import and setup
import seaborn as sns
sns.set_theme()
# Create publication_year column
df["publication_year"] = [int(date[-4:]) for date in df["publication_date"]]
# Subset the data to include only three authors and publication years between 1998 and 2004
df = df[df["authors"].isin(["Stephen King", "Orson Scott Card", "James Patterson"]) & (df["publication_year"] > 1998) & (df["publication_year"] < 2004)]
df.head()
Output:

From the results of df.head()
we can see there are 6 columns. We'll be focusing on authors
and publication_year
.
Basic count plot: sns.countplot(data, x or y)
sns.countplot() Example 1: x
# Example 1
sns.countplot(data = df, x = "authors")
Output:

The most basic sns.countplot() example, uses just two arguments:
data
: dataset, such as a DataFrame (i.e.df
)x
: name of variable to be plotted on the x-axis (i.e.authors
), results in a count plot with vertical bars
sns.countplot() Example 2: y
If you want a count plot with horizontal bars, you can simply use the y
argument, rather than x
.
# Example 2
sns.countplot(data = df, y = "authors")
Output:

Advanced count plot: order and hue
For more advanced plots, you can specify the order and color of your bars based on the categories being plotted.
sns.countplot() Example 3: order
# Specify order of bars
year_order = df['publication_year'].value_counts().index
print(year_order)
# Example 3
sns.countplot(data = df, x = "publication_year", order = year_order)
Output:
Int64Index([2002, 2001, 2000, 2003, 1999], dtype='int64')

In this case, we ordered the bars based on how many books were published in each year.
sns.countplot() Example 4: hue
Lastly, we'll use the hue
argument to compare books published by author and year variables.
# Example 4
sns.countplot(data = df, y = "authors", hue = "publication_year")
# Adjust legend placement
import matplotlib.pyplot as plt
plt.legend(bbox_to_anchor = (1.02, 1), loc = 'upper left', borderaxespad = 0)
Output:

About
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.