As a data scientist, you have to deal with a lot of categorical data, from product to subscription types, there are many instances when you need an intuitive plot to compare different groups. In this post, we’ll review the basics of seaborn’s
countplot() function. We’ll cover a few key arguments, including
hue, as well as a few plot examples.
Use the table of contents on the left to jump to the most relevant plot for your use-case. Otherwise, check out the canvas below for all of the code. We’ve used a subset of a books dataset found on Kaggle.
Import and setup
import seaborn as sns sns.set_theme() # Create publication_year column df["publication_year"] = [int(date[-4:]) for date in df["publication_date"]] # Subset the data to include only three authors and publication years between 1998 and 2004 df = df[df["authors"].isin(["Stephen King", "Orson Scott Card", "James Patterson"]) & (df["publication_year"] > 1998) & (df["publication_year"] < 2004)] df.head()
From the results of
df.head() we can see there are 6 columns. We'll be focusing on
Basic count plot: sns.countplot(data, x or y)
sns.countplot() Example 1: x
# Example 1 sns.countplot(data = df, x = "authors")
The most basic sns.countplot() example, uses just two arguments:
data: dataset, such as a DataFrame (i.e.
x: name of variable to be plotted on the x-axis (i.e.
authors), results in a count plot with vertical bars
sns.countplot() Example 2: y
If you want a count plot with horizontal bars, you can simply use the
y argument, rather than
# Example 2 sns.countplot(data = df, y = "authors")
Advanced count plot: order and hue
For more advanced plots, you can specify the order and color of your bars based on the categories being plotted.
sns.countplot() Example 3: order
# Specify order of bars year_order = df['publication_year'].value_counts().index print(year_order) # Example 3 sns.countplot(data = df, x = "publication_year", order = year_order)
Int64Index([2002, 2001, 2000, 2003, 1999], dtype='int64')
In this case, we ordered the bars based on how many books were published in each year.
sns.countplot() Example 4: hue
Lastly, we'll use the
hue argument to compare books published by author and year variables.
# Example 4 sns.countplot(data = df, y = "authors", hue = "publication_year") # Adjust legend placement import matplotlib.pyplot as plt plt.legend(bbox_to_anchor = (1.02, 1), loc = 'upper left', borderaxespad = 0)
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.