How to plot a heatmap in seaborn: formatting data, annotations, and more

Einblick Content Team - May 2nd, 2023

Heatmaps are a useful visualization for comparing variables or exploring the relationship between them. This can be useful for correlation analysis, feature selection, model tuning, and identifying outliers across multiple dimensions. In this post, we utilize seaborn's heatmap() function and provide examples using a few key arguments that can ensure your heatmap conveys information effectively.

In the below examples, we utilize a subset of data about Adidas sales in the US. We create several heatmaps about operating margin, based on geographic region and retailer.

Basic Syntax: sns.heatmap(data)

The only argument you need to input a value for is the data. But the data needs to be in a particular format, specifically in a 2D format that can be cast into a NumPy array.

Format data with df.pivot() or df.pivot_table()

In order to get your typical DataFrame into the right format, you can use df.pivot() or df.pivot_table(). The main difference is that df.pivot() cannot handle duplicate index/column pairs or aggregation, whereas df.pivot_table() can.

In our example of shoe sales, there are multiple locations of each retailer in each region, so we'll need to calculate the mean operating margin for each retailer-region pair. This means, we should use df.pivot_table().

# Handle missing values however makes sense
df.dropna(inplace = True)

# Pivot data, save as df_heatmap
df_heatmap = df.pivot_table(index = 'Retailer', columns = 'Region', values = 'Operating Margin', aggfunc = 'mean', fill_value = 0)

df_heatmap

Output:

Region	      Midwest	 Northeast	 South	     Southeast	 West
Retailer					
Amazon	      0.458382	 0.414288	 0.403333	 0.450522	 0.356190
Foot Locker	  0.429892	 0.415246	 0.440903	 0.421848	 0.385303
Kohl's	      0.448770	 0.427294	 0.428182	 0.000000	 0.402673
Sports Direct 0.434510	 0.378983	 0.500956	 0.430445	 0.433280
Walmart	      0.000000	 0.386515	 0.442568	 0.354533	 0.314091
West Gear	  0.430191	 0.434205	 0.444926	 0.414176	 0.401152

Plot heatmap

# Basic heatmap
sns.heatmap(data = df_heatmap)

Output:

seaborn heatmap Example 1seaborn heatmap Example 1

With annotations: sns.heatmap(data, annot)

In some cases, it can be helpful to include the values on the heatmap, especially when the gradient is more shallow, or if small differences are particularly important. You can do so using the annot argument.

# Heatmap with annotations
sns.heatmap(data = df_heatmap, annot = True)

Output:

seaborn heatmap example with annotationsseaborn heatmap example with annotations

With custom colormap: sns.heatmap(data, vmin, vmax, cmap)

If you want to change the scale on the color gradient, you can use the vmin and vmax arguments to set the minimum and maximum values, respectively. Additionally, you can utilize any colormap from matplotlib to change the aesthetic of the heatmap via the cmap argument.

# Heatmap with colormap
sns.heatmap(data = df_heatmap, vmin = 0, vmax = 1, cmap = "Purples", annot = True)

Output:

seaborn heatmap cmap exampleseaborn heatmap cmap example

With custom gridlines: sns.heatmap(data, linewidths, linecolor)

Lastly, if you want to ensure a clear difference between each square in the heatmap, you can adjust the appearance of the lines via linewidths and linecolor.

# Heatmap with adjusting line formatting
sns.heatmap(data = df_heatmap, cmap = "Purples", linewidths = 2, linecolor = "black")

Output:

seaborn heatmap custom gridlines exampleseaborn heatmap custom gridlines example

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.

Start using Einblick

Pull all your data sources together, and build actionable insights on a single unified platform.

  • All connectors
  • Unlimited teammates
  • All operators