Histograms are a key visualization tool that help show the distribution of numerical data. Some histograms are easier than others to customize. For example, you may have struggled with `matplotlib`

histograms in the past–the library on which `seaborn`

is built. This post will go over some of the many ways you can use `seaborn`

’s `histplot()`

function to create highly tuned and beautiful histograms.

The data comes from a Kaggle dataset on Olympic athletes from 1896 to 2016. We’ll be creating histograms visualizing their height distributions. Jump to different examples using the table of contents on the left.

## Import packages and load data

```
import seaborn as sns
sns.set_theme()
```

We’ve already pre-loaded our dataset into Einblick using the upload CSV functionality, and the data is saved as a `pandas DataFrame`

called `df`

.

`sns.histplot(data, x or y, discrete, bins)`

Basic histogram: `sns.histplot(data, x or y)`

Examples 1 and 2: Your most basic `seaborn`

histogram relies on two arguments:

`data`

: a variable, in this case `df` where your data is stored`x`

or`y`

: the column name that's storing the numerical variable we're counting. Whether you use`x`

or`y`

will simply determine the orientation of the bars.

```
# Example 1
sns.histplot(data = df, x = "Height")
```

**Output:**

```
# Example 2
sns.histplot(data = df, y = "Height")
```

**Output:**

### Example 3: custom binning

If, as in the example above, the default binning is unsatisfactory, you can use the `bins`

argument to determine exactly how you want to bin the data. You can either enter an integer, `n`

, to specify the number of bins you want, or you can enter a list of cutoff points for the bins, for example `[0, 100, 300, 450, 600]`

. Note that the cutoff points do not need to be evenly spaced.

```
# Example 3
sns.histplot(data = df, x = "Height", bins = 10)
```

**Output:**

`discrete`

Example 4: In our example, our variable, `Height`

, is measured in centimeters, but no fractional amounts were taken. This resulted in the gaps we can see in examples 1 and 2. But aesthetically, we can alter this using the `discrete = True`

argument, which will center each bar and prevent gaps.

```
# Example 4
sns.histplot(data = df, x = "Height", discrete = True)
```

**Output:**

`kde`

)

Example 5: kernel density estimate (If you want to plot a kernel density estimate, which estimates the probability density function on a finite dataset, you can use the `kde = True`

argument.

```
# Example 5
df["Height"] = df["Height"].astype(float)
sns.histplot(data = df, x = "Height", discrete = True, kde = True)
```

**Output:**

`sns.histplot()`

: hue, x AND y, multiple, stat

Advanced plots with `hue`

Example 6: comparing groups with If you want to compare the distribution of a variable across multiple groups, you can use the `hue`

argument to do so. Simply set `hue = "Sport"`

, where `"Sport"`

is the column in the dataset, `df`

, containing the group labels.

```
# Example 6
sns.histplot(data = df, x = "Height", hue = "Sport")
```

**Output:**

`multiple = "dodge"`

)

Example 7: side-by-side bars (As in the example above, you can see that by default, when plotting multiple groups, the bars overlap (`multiple = "layer"`

). Sometimes, however, you want to plot the bars side-by-side. You can do this by setting the argument `multiple = "dodge"`

. The two other options available are `"stack"`

and `"fill"`

.

```
# Example 7
sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge")
```

**Output:**

`stat`

)

Example 8: different aggregate statistics (If you want to check the distribution of the variable according to a different aggregate statistic, you can do so using the `stat`

argument. The options are `count`

, `frequency`

, `probability`

or `proportion`

, `percent`

, and `density`

.

```
# Example 8
sns.histplot(data = df, x = "Height", bins = 10, hue = "Sport", multiple = "dodge", stat = "probability")
```

**Output:**

`sns.histplot(data, x, y)`

BONUS: creating a heat map using Although there are other heatmap functions available in Python, you can actually create one using the `x`

AND `y`

variables together in the `seaborn`

`histplot()`

function.

```
# Make sure the columns are of compatible type
df["Weight"] = df["Weight"].astype(float)
df["Height"] = df["Height"].astype(float)
# Example 9
sns.histplot(data = df, x = "Height", y = "Weight")
```

**Output:**

### BONUS: create multiple color maps

Lastly, you can compare distributions of groups by creating a color map. This has a similar effect to box plots side-by-side. If these interest you, consider them next time!

```
df["Height"] = df["Height"].astype(float)
# Example 10
sns.histplot(data = df, x = "Height", y = "Sport", hue = "Sport", legend = False)
```

**Output:**

### About

Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.