Heatmaps are a useful visualization for comparing variables or exploring the relationship between them. This can be useful for correlation analysis, feature selection, model tuning, and identifying outliers across multiple dimensions. In this post, we utilize seaborn's
heatmap() function and provide examples using a few key arguments that can ensure your heatmap conveys information effectively.
In the below examples, we utilize a subset of data about Adidas sales in the US. We create several heatmaps about operating margin, based on geographic region and retailer.
The only argument you need to input a value for is the
data. But the data needs to be in a particular format, specifically in a 2D format that can be cast into a NumPy array.
Format data with
In order to get your typical DataFrame into the right format, you can use
df.pivot_table(). The main difference is that
df.pivot() cannot handle duplicate index/column pairs or aggregation, whereas
In our example of shoe sales, there are multiple locations of each retailer in each region, so we'll need to calculate the mean operating margin for each retailer-region pair. This means, we should use
# Handle missing values however makes sense df.dropna(inplace = True) # Pivot data, save as df_heatmap df_heatmap = df.pivot_table(index = 'Retailer', columns = 'Region', values = 'Operating Margin', aggfunc = 'mean', fill_value = 0) df_heatmap
Region Midwest Northeast South Southeast West Retailer Amazon 0.458382 0.414288 0.403333 0.450522 0.356190 Foot Locker 0.429892 0.415246 0.440903 0.421848 0.385303 Kohl's 0.448770 0.427294 0.428182 0.000000 0.402673 Sports Direct 0.434510 0.378983 0.500956 0.430445 0.433280 Walmart 0.000000 0.386515 0.442568 0.354533 0.314091 West Gear 0.430191 0.434205 0.444926 0.414176 0.401152
# Basic heatmap sns.heatmap(data = df_heatmap)
In some cases, it can be helpful to include the values on the heatmap, especially when the gradient is more shallow, or if small differences are particularly important. You can do so using the
# Heatmap with annotations sns.heatmap(data = df_heatmap, annot = True)
With custom colormap:
sns.heatmap(data, vmin, vmax, cmap)
If you want to change the scale on the color gradient, you can use the
vmax arguments to set the minimum and maximum values, respectively. Additionally, you can utilize any colormap from
matplotlib to change the aesthetic of the heatmap via the
# Heatmap with colormap sns.heatmap(data = df_heatmap, vmin = 0, vmax = 1, cmap = "Purples", annot = True)
With custom gridlines:
sns.heatmap(data, linewidths, linecolor)
Lastly, if you want to ensure a clear difference between each square in the heatmap, you can adjust the appearance of the lines via
# Heatmap with adjusting line formatting sns.heatmap(data = df_heatmap, cmap = "Purples", linewidths = 2, linecolor = "black")
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.