Heatmaps are a useful visualization for comparing variables or exploring the relationship between them. This can be useful for correlation analysis, feature selection, model tuning, and identifying outliers across multiple dimensions. In this post, we utilize seaborn's heatmap()
function and provide examples using a few key arguments that can ensure your heatmap conveys information effectively.
In the below examples, we utilize a subset of data about Adidas sales in the US. We create several heatmaps about operating margin, based on geographic region and retailer.
sns.heatmap(data)
Basic Syntax: The only argument you need to input a value for is the data
. But the data needs to be in a particular format, specifically in a 2D format that can be cast into a NumPy array.
df.pivot()
or df.pivot_table()
Format data with In order to get your typical DataFrame into the right format, you can use df.pivot()
or df.pivot_table()
. The main difference is that df.pivot()
cannot handle duplicate index/column pairs or aggregation, whereas df.pivot_table()
can.
In our example of shoe sales, there are multiple locations of each retailer in each region, so we'll need to calculate the mean operating margin for each retailer-region pair. This means, we should use df.pivot_table()
.
# Handle missing values however makes sense
df.dropna(inplace = True)
# Pivot data, save as df_heatmap
df_heatmap = df.pivot_table(index = 'Retailer', columns = 'Region', values = 'Operating Margin', aggfunc = 'mean', fill_value = 0)
df_heatmap
Output:
Region Midwest Northeast South Southeast West
Retailer
Amazon 0.458382 0.414288 0.403333 0.450522 0.356190
Foot Locker 0.429892 0.415246 0.440903 0.421848 0.385303
Kohl's 0.448770 0.427294 0.428182 0.000000 0.402673
Sports Direct 0.434510 0.378983 0.500956 0.430445 0.433280
Walmart 0.000000 0.386515 0.442568 0.354533 0.314091
West Gear 0.430191 0.434205 0.444926 0.414176 0.401152
Plot heatmap
# Basic heatmap
sns.heatmap(data = df_heatmap)
Output:

sns.heatmap(data, annot)
With annotations: In some cases, it can be helpful to include the values on the heatmap, especially when the gradient is more shallow, or if small differences are particularly important. You can do so using the annot
argument.
# Heatmap with annotations
sns.heatmap(data = df_heatmap, annot = True)
Output:

sns.heatmap(data, vmin, vmax, cmap)
With custom colormap: If you want to change the scale on the color gradient, you can use the vmin
and vmax
arguments to set the minimum and maximum values, respectively. Additionally, you can utilize any colormap from matplotlib
to change the aesthetic of the heatmap via the cmap
argument.
# Heatmap with colormap
sns.heatmap(data = df_heatmap, vmin = 0, vmax = 1, cmap = "Purples", annot = True)
Output:

sns.heatmap(data, linewidths, linecolor)
With custom gridlines: Lastly, if you want to ensure a clear difference between each square in the heatmap, you can adjust the appearance of the lines via linewidths
and linecolor
.
# Heatmap with adjusting line formatting
sns.heatmap(data = df_heatmap, cmap = "Purples", linewidths = 2, linecolor = "black")
Output:

About
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.