Box plots with Prompt AI

Becca Weng - September 19th, 2023

Box plots are a simple way to visualize the distribution and outliers of a particular variable. They become infinitely more informative when you can compare multiple box plots side-by-side, allowing a deeper understanding and analysis of relationships between variables. In this post, we'll provide a few example prompts to produce box plots using the Olympics dataset from Kaggle. We subsetted the data for Olympics held in 2000 or later, a few select sports, and only athletes that earned medals.

filtered_df.head()

Code Output:

All the examples are available in a shared canvas, that has other example Prompts for other data visualizations and use-cases. You can simply open and fork the canvas below.

Prompt 1: Basic box plot

Create a box plot of height
# PROMPT: Create a box plot of height

import seaborn as sns
import matplotlib.pyplot as plt

# Create the box plot using seaborn
sns.boxplot(data=filtered_df, x='Height')

# Add labels and title
plt.xlabel('Height')
plt.title('Box Plot of Height')

# Display the plot
plt.show()

Code Output:

Prompt 2: Change orientation

If you want to change the orientation of the box plot, you can simply select Prompt > Change this cell, and try the query below.

Make the box plot vertical
# CHANGE: Make the box plot vertical

import seaborn as sns
import matplotlib.pyplot as plt

# Create the box plot using seaborn
sns.boxplot(data=filtered_df, y='Height')

# Add labels and title
plt.ylabel('Height')
plt.title('Box Plot of Height')

# Display the plot
plt.show()

Code Output:

Prompt 3: Change color or add axis

If you want to change the color (or use the other axis in seaborn), you can just add on to the original prompt.

Create box plots of height, color by sport
# PROMPT: Create box plots of height, color by sport

import seaborn as sns

# Create a box plot of the 'Height' column from the 'filtered_df' dataframe
plt.figure(figsize=(10, 6))
sns.boxplot(x=filtered_df['Height'], y=filtered_df['Sport'], palette='Set3')

# Set the x-axis label and title
plt.xlabel('Height')
plt.title('Distribution of Heights across Different Sports')

# Show the plot
plt.show()

Code Output:

Prompt 4: Add axis AND change color

If you want to change the color based on one variable, and also compare the distributions according to another variable, you'll need to specify these in your prompt, as below:

Create box plots of height, compare sport on the y-axis, and color by gender

TIP: specify the visualization library you want to use if you have strong preferences. This helps the AI agent use your desired function and arguments.

# PROMPT: Create box plots of height, compare sport on the y-axis, and color by gender

import seaborn as sns

# Create a box plot of the 'Height' column from the 'filtered_df' dataframe using seaborn, color by gender
plt.figure(figsize=(10, 6))
sns.boxplot(y='Sport', x='Height', hue='Sex', data=filtered_df)  # Changed 'lightblue' to 'Blues'
plt.xlabel('Height')
plt.ylabel('Sport')
plt.title('Distribution of Heights across Different Sports')
plt.show()

Code Output:

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.

Start using Einblick

Pull all your data sources together, and build actionable insights on a single unified platform.

  • All connectors
  • Unlimited teammates
  • All operators