Scatter plots with Prompt AI

Becca Weng - September 1st, 2023

Scatter plots are key when exploring basic relationships between variables. There are a number of aesthetic variables that you can toggle to make the most effective plot. With Prompt, Einblick's AI agent, it has never been easier to customize and create the most beautiful, effective plots. Simply describe the plot you want, and Prompt will generate and run the Python code required. In this post, we'll provide a few example histograms using the Olympics dataset from Kaggle. We subsetted the data for Olympics held in 2000 or later, a few select sports, and only athletes that earned medals.

filtered_df.head()

Code Output:

The full examples can be found in the embedded canvas below. Simply open and fork the below canvas. All of the natural language prompts used are provided in an accompanying Markdown cell to the left of the graph, along with any modifications made.

Prompt 1: Basic scatter plot

Since the most basic scatter plots are used for visualizing the relationship between two numerical variables, you can simply ask Prompt the following:

Plot height vs weight

NOTE: replace height and weight with whatever numerical variables are in your particular dataset. Prompt will infer that you want a plot a scatter plot based on the context.

# PROMPT: Plot height vs weight

import matplotlib.pyplot as plt

# Plotting height against weight using scatter plot
plt.scatter(filtered_df['Height'], filtered_df['Weight'])
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Height vs Weight')
plt.show()

Code Output:

Basic scatter plot example (Height vs. Weight)Basic scatter plot example (Height vs. Weight)

This is a great first plot, but there is a lot more information that a simple scatter plot can show, and a lot more ways you may want to customize your plot. For example, you may want to:

  • Use a different visualization library
  • Color parts of the graph differently based on a different variable
  • Change the size, shape, or color of the data points

Einblick Prompt has you covered--no need to fuss with finnicky syntax!

Prompt 2: Color-coded scatter plot

Create a scatterplot of height and weight, using seaborn, color by medal

As you can see, you can add on your requests directly in the prompt. In this case, we asked Prompt to use a specific library, and to color code the plot.

TIP: Don't worry about installation. In Einblick, Prompt will automatically pip install and import any missing libraries.

# PROMPT: Create a scatterplot of height and weight, using seaborn, color by medal

# Import the necessary libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Create a scatterplot using seaborn's scatterplot function
sns.scatterplot(data=filtered_df, x='Height', y='Weight', hue='Medal')

# Add labels to the x-axis, y-axis, and title of the plot
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Scatterplot of Height and Weight')

# Display the plot
plt.show()

Code Output:

This plot now allows us to compare different groups, and it seems there is an equal distribution of medals across the height and weight of the athletes. But the data points are highly clustered, particularly in the middle, so let's see if adjusting the transparency will help get a clearer image of the data.

Prompt 3: Adjust transparency

Since Prompt is able to take in natural language queries, we can also ask Prompt to duplicate any plot, and then adjust it as we like. This helps in side-by-side comparisons, if you're determining which is the best version to present to stakeholders, for example.

Start by selecting Prompt > Add new cell below

Duplicate the above plot

Then with the newly duplicated plot, select Prompt > Change this cell

Make the data points transparent
# PROMPT: Duplicate the above plot
# CHANGE: Make the data points transparent

# Import the necessary libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Create a scatterplot using seaborn's scatterplot function
sns.scatterplot(data=filtered_df, x='Height', y='Weight', hue='Medal', alpha=0.5)  # Set alpha value to make data points transparent

# Add labels to the x-axis, y-axis, and title of the plot
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Scatterplot of Height and Weight')

# Display the plot
plt.show()

Code Output:

Prompt 4: Customize color and size

If you want more distinction between the groups, you can double-down, and alter the color and size of the data points.

Plot height and weight using seaborn, color and size by medal
# PROMPT: Plot height and weight using seaborn, color and size by medal
# Import the necessary libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Create a scatter plot using seaborn's scatterplot function
sns.scatterplot(data=filtered_df, x='Height', y='Weight', hue='Medal', size='Medal')

# Add labels and title to the plot
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Distribution of Height and Weight among Medal Winners')

# Show the plot
plt.show()

Code Output:

If the default color palette or sizing doesn't make sense for a use-case (such as this one, where the groupings are colors, and there is an order that is not alphabetical), you can ask Prompt to use a custom color palette and sizing that makes sense.

Plot height and weight using seaborn, color and size by medal. Use custom color palette: Gold = gold, Silver = silver, Bronze = bronze.

Here, we added an additional prompt via "Change this cell" to ensure that the sizing of the data points made sense for the use-case.

Gold is the biggest marker, bronze is the smallest
# PROMPT: Plot height and weight using seaborn, color and size by medal. Use custom color palette: Gold = gold, Silver = silver, Bronze = bronze.
# CHANGE: Gold is the biggest marker, bronze is the smallest

# Import the necessary libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Create the plot using seaborn scatterplot
sns.scatterplot(x=filtered_df['Height'], y=filtered_df['Weight'], hue=filtered_df['Medal'], palette={'Gold': '#FFD700', 'Silver': '#C0C0C0', 'Bronze': '#CD7F32'}, size=filtered_df['Medal'], sizes=(200, 50))

plt.xlabel('Height')  # Set the x-axis label
plt.ylabel('Weight')  # Set the y-axis label
plt.title('Height vs Weight')  # Set the title of the plot
plt.show()  # Display the plot

Code Output:

Scatter plot example (Height vs. Weight), with hue and size, with PromptScatter plot example (Height vs. Weight), with hue and size, with Prompt

Prompt 5: Change marker shape

Similar to the above prompts, if you want to double-down on differentiators, you can also ask Prompt to change the shape of the markers based on a column.

Plot height and weight using seaborn, color and shape by medal.
# PROMPT: Plot height and weight using seaborn, color and shape by medal.

import seaborn as sns
import matplotlib.pyplot as plt

# Plot a scatterplot of Height vs Weight, with different colors and markers for each Medal type
sns.scatterplot(data=filtered_df, x='Height', y='Weight', hue='Medal', style='Medal', palette={'Gold': '#FFD700', 'Silver': '#C0C0C0', 'Bronze': '#CD7F32'})

plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Height vs Weight')
plt.show()

Code Output:

Scatter plot example (Height vs. Weight), with hue and shape (Medal), with PromptScatter plot example (Height vs. Weight), with hue and shape (Medal), with Prompt

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.

Start using Einblick

Pull all your data sources together, and build actionable insights on a single unified platform.

  • All connectors
  • Unlimited teammates
  • All operators