Einblick Prompt vs. Jupyter AI

Becca Weng - August 9th, 2023
Einblick Prompt and Jupyter AI logosEinblick Prompt and Jupyter AI logos

Given the recent advancements in generative AI there have been a number of applications sprouting up around this new technology. But, generative AI can be unpredictable–as we saw in the varied results people shared online from ChatGPT conversations. As a result, to help demystify genAI for our users, and to make it easier to understand the scope of what generative AI can do for you in the data domain, we’re starting a series to cover the leading apps in the space. In this article, we’ll be focusing on Jupyter AI, the OpenAI-powered extension available wherever the IPython kernel runs.

Check out the Table of Contents on the left to jump to the section most relevant to you.

What is Jupyter AI

Screenshot of Jupyternaut and Jupyter AI in JupyterLabScreenshot of Jupyternaut and Jupyter AI in JupyterLab

Jupyter AI’s makers have billed it as a “reproducible generative AI playground,” where users simply invoke the %%ai magic and gain access to various large language models directly in their notebooks. Models available include Anthropic’s Claude, Cohere, and of course OpenAI’s GPT models. Users invoke specific models using the magic command. For example %%ai chatgpt. Anything else that follows in the code cell will then be passed to the model. The response of the model is then printed below, like an output cell in a Jupyter Notebook. While Jupyter AI is available across all platforms utilizing the IPython kernel, users can only gain access to the chatbot, fondly referred to as Jupyternaut, when using Jupyter Lab.

What is Einblick Prompt

Einblick Prompt is a context-aware AI agent, built off of OpenAI and LangChain. Available in any of Einblick’s AI-native data notebooks, Prompt can build and run entire data workflows in as little as one sentence. Einblick’s data notebooks have a unique 2-D canvas layout that allows users to easily compare code and charts side-by-side, as well as iterate quickly on different workflows. Users can access Prompt by clicking anywhere on the canvas and selecting Prompt, or by selecting Prompt off of an existing Python cell or Table cell.

While running, Prompt will type out the steps it will take before the code populates and auto-runs. Since it is context-aware, Prompt will intuit column names, data types, and other metadata available. If unsatisfied with the output, you can ask Prompt to change the existing cell, or to fix any errors. Then you can continue building manually or use Prompt to add on to existing work.

Prompt vs. Jupyter AI Feature Comparison

Einblick PromptJupyter AI
Context-aware
Directly fixes and debugs code
Can auto-run code
Built for data tasks
Generates executable code in code cells
Generates commented code
Interface with model in a data notebook
Installation requiredN/ARequired
Prompt persistenceHistory of last 5 prompts run in notebookUntil kernel is shutdown (i.e. file is closed)
Multiple large language models available
General-purpose AI assistant

Installation and getting started

Using Prompt in Einblick

Prompt is available to all Einblick users in all Einblick data notebooks. You don’t need to supply your own OpenAI API key, and since Einblick is web-based, there is no installation required. All you need to do is make an Einblick account and verify your email. The free tier offers 5 Einblick canvases, and access to Prompt.

Using Jupyter AI in JupyterLab

# Installs %%ai magic and JupyterLab
pip install jupyter_ai

You’ll also need an API key for the models you’re using. You will need to configure this directly in JupyterLab’s Jupyternaut settings to use the conversational assistant. Additionally, to query the model directly in the notebook interface, you’ll need to save the API key as an environment variable. For example %env OPENAI_API_KEY=token. Make sure to follow the syntax exactly for your API key of choice.

Once in JupyterLab, you’ll have to run the following line of code every time you want to access the %%ai magic:

%load_ext jupyter_ai_magics # or %load_ext jupyter_ai

To interface with a particular model, use the double percent sign, and name the model. Everything else in that cell will be sent to the model. If you want a full list of models, try %ai list.

%%ai chatgpt
What is Python

Using Jupyter AI in Jupyter Notebook (or other IPython kernel)

# Installs %%ai magic
pip install jupyter_ai_magics

Once installed, the same rules apply regarding the environment variables (i.e. OPENAI_API_KEY), loading Jupyter AI, and calling models.

Using Einblick Prompt vs. Jupyter AI

To test the two products, we built out a notebook analyzing the titanic dataframe. We used the same dataset and the same natural language prompts to compare the user experience on equal footing. Since Einblick is a web-based application, we simply uploaded the CSV file to the app. Meanwhile for Jupyter AI, we saved the CSV file in a data folder that the notebook had access to.

NOTE: We mainly used %%ai chatgpt

Screenshot of JupyterLab, highlighting UI for Jupyternaut and Jupyter AIScreenshot of JupyterLab, highlighting UI for Jupyternaut and Jupyter AI

At the start, I tried using Jupyternaut, but it seemed that the chat assistant could not directly generate code in the notebook, so to simplify things, all of the analysis reviewed below was done directly in the JupyterLab notebook interface using the %%ai chatgpt magic. The %%ai chatgpt magic does not generate code directly by default in the notebook either, but it felt easier than switching between the chat and notebook interface.

--format code option

There is an option to automatically format results as code: %%ai chatgpt --format code, which I also tried out, with mixed results. The pro of using this option is code is directly generated as a Python cell. But, the code generated required more tailoring in terms of column names and dataframe names. Additionally, for the more complex requests (see Use-Case: Building ML models), the code generated did not include any indication that there were ambiguous steps (like preprocessing data). To give the best evaluation, the results shown below do not use the --format code option.

Results in Einblick

Screenshot of Einblick Prompt showing list of stepsScreenshot of Einblick Prompt showing list of steps

In Einblick, Prompt will give the user constant feedback, such as “Working on it…” typically followed by a restatement of the user’s request, and then a numbered list of steps Prompt will take. Following this, then Prompt will generate the code in a Python cell. If you have the auto-run cells option toggled, the code will be run immediately. If not, you can manually run with Shift + Enter.

Since Einblick is cloud-based and managed, you can see the entire analysis in the embedded canvas below. Just Open > Fork. Opening an account is free.

Results in Jupyter AI

Screenshot of Jupyter AI %%ai magic exampleScreenshot of Jupyter AI %%ai magic example

In Jupyter AI, it was a bit more unclear if the AI was running, particularly in Jupyternaut. In the notebook interface, if the AI is running, it will look like any other Python cell running with the * symbol on the left of the cell. Final output always comes as IPython Markdown below the Python input cell.

Additionally, because any code comes out as part of Markdown, you always have to copy-paste the code into another Python cell to try the code out. Frequently Jupyter AI would also repeat lines of code from prior requests. It’s unclear at the time of writing if that’s intentional or if the AI is getting a bit muddled with multiple back-to-back requests.

Use-case: Data cleaning

Prompt 1: Dropping and creating columns

Drop ticket, fare, and cabin columns. Create a Title column from the name column.

Einblick Prompt Output

import pandas as pd

# Drop the 'Ticket', 'Fare', and 'Cabin' columns from the 'titanic' dataframe
titanic = titanic.drop(['Ticket', 'Fare', 'Cabin'], axis=1)

# Create a new 'Title' column extracted from the 'Name' column in the 'titanic' dataframe
titanic['Title'] = titanic['Name'].str.extract(' ([A-Za-z]+)\.', expand=False)

print(titanic.head(3))

Output:

PassengerId	Survived	Pclass	... Title
1	        0.0	        3	    ... Mr
2	        1.0	        1	    ... Mrs
3	        1.0	        3	    ... Miss

The code runs as expected, and I can add on new prompts directly to this cell, or I can change it if there’s anything I want to update.

Jupyter AI Output

NOTE: I had to run the prompt twice to get code, but after the AI calibrated to the fact I wanted Python code in the output, the format generally followed this structure.

Apologies for the confusion. Here's an updated code snippet that drops the 'Ticket', 'Fare', and 'Cabin' columns, and creates a 'Title' column from the 'Name' column:

import pandas as pd

# Assume you have a DataFrame called "df" with the columns: Name, Ticket, Fare, Cabin

# Drop the Ticket, Fare, and Cabin columns
df_dropped = df.drop(['Ticket', 'Fare', 'Cabin'], axis=1)

# Create the Title column from the Name column
df_dropped['Title'] = df_dropped['Name'].apply(lambda x: x.split(',')[1].split('.')[0].strip())

# Print the updated DataFrame in markdown format
print(df_dropped[['Name', 'Title']].to_markdown(index=False))

This code will drop the specified columns ('Ticket', 'Fare', 'Cabin') and create a new 'Title' column based on extracting the title from the 'Name' column. The final line of code will print the 'Name' and 'Title' columns of the updated DataFrame in Markdown format.

I copied the relevant code, and pasted it into another Python cell. It ran okay, but I have to keep in mind that the name of the dataframe has changed.

Prompt 2: Explore missing data

Drop the Name column. Get statistics on missing data.

Einblick Prompt Output

# Drop the 'Name' column from the 'titanic' dataframe
titanic = titanic.drop('Name', axis=1)

# Get statistics on the missing data in the 'titanic' dataframe
missing_data_stats = titanic.isnull().sum()

print(missing_data_stats)

Output:

PassengerId      0
Survived       418
Pclass           0
Embarked         2
Sex              0
Age            263
SibSp            0
Parch            0
Title            0
dtype: int64

I liked that the output dropped the column on the original dataframe. I can ask Prompt not to do this though, if I want to. Additionally, I appreciate that the statistics are in a separate table.

Jupyter AI Output

To drop the 'Name' column and obtain statistics on missing data, you can use the following code:
import pandas as pd

# Assume you have a DataFrame called "df" with Name column

# Drop the Name column
df_dropped = df.drop(['Name'], axis=1)

# Get statistics on missing data
missing_data_stats = df_dropped.isnull().sum().reset_index()
missing_data_stats.columns = ['Column', 'Missing Values']

# Print the missing data statistics in markdown format
print(missing_data_stats.to_markdown(index=False))

This code will drop the 'Name' column from the DataFrame using the drop() function. Then, it calculates the number of missing values in each column using isnull().sum(). The result is stored in missing_data_stats DataFrame. Lastly, the code prints the missing data statistics in markdown format using the to_markdown() function.

I’m not quite sure why Jupyter AI is forcing the to_markdown formatting, but this is consistent in all the other prompts I ran as well, so it seems to be a default. I chose to exclude those print statements when I copied over the code to test it out.

Prompt 3: Cleaning text data

I continued cleaning the data via a few more prompts, but wanted to call out the prompt where I asked the models to replace certain strings.

Replace Mlle, Countess, Lady, Mme with Mrs. Replace Ms with Miss. Replace Master, Rev, Col, Major, Capt, Don, Jonkheer, Sir with Mr.

Einblick Prompt Output

# Replace values in the 'Title' column of the 'titanic' dataframe

# Replace 'Mlle', 'Countess', 'Lady', 'Mme' with 'Mrs'
titanic['Title'] = titanic['Title'].replace(['Mlle', 'Countess', 'Lady', 'Mme'], 'Mrs')

# Replace 'Ms' with 'Miss'
titanic['Title'] = titanic['Title'].replace('Ms', 'Miss')

# Replace 'Master', 'Rev', 'Col', 'Major', 'Capt', 'Don', 'Jonkheer', 'Sir' with 'Mr'
titanic['Title'] = titanic['Title'].replace(['Master', 'Rev', 'Col', 'Major', 'Capt', 'Don', 'Jonkheer', 'Sir'], 'Mr')

print(titanic.head())

Output:

PassengerId  Survived  Pclass   Embarked Sex     Age  ... Title
1            0.0       3        S        male    22.0 ... Mr
2            1.0       1        C        female  38.0 ... Mrs
3            1.0       3        S        female  26.0 ... Miss
4            1.0       1        S        female  35.0 ... Mrs
5            0.0       3        S        male    35.0 ... Mr

Jupyter AI Output

Apologies for the confusion, here's an updated code snippet that replaces the titles accordingly:
import pandas as pd

# Assume you have a DataFrame called "df" with the Title column

# Replace titles
df['Title'] = df['Title'].replace(['Mlle', 'the Countess', 'Lady', 'Mme'], 'Mrs')
df['Title'] = df['Title'].replace(['Ms'], 'Miss')
df['Title'] = df['Title'].replace(['Master', 'Rev', 'Col', 'Major', 'Capt', 'Don', 'Jonkheer', 'Sir'], 'Mr')

# Print the updated DataFrame in markdown format
print(df.to_markdown(index=False))
This code replaces the specified titles using the `

The output seems to have been cut off, but otherwise, the code works well–again omitting the last line of code.

Use-case: EDA (exploratory data analysis)

Prompt 1: Plot var1 by var2 and var3

Plot survival by Pclass and gender.

Einblick Prompt Output

import matplotlib.pyplot as plt

# Group the 'Survived' column by the 'Pclass' and 'Sex' columns
grouped_data = titanic.groupby(['Pclass', 'Sex'])['Survived']

# Calculate the mean of the grouped data
mean_survival = grouped_data.mean()

# Reset the index of the resulting dataframe
mean_survival = mean_survival.reset_index()

# Create a pivot table using the 'Pclass' and 'Sex' columns as index, and the 'Survived' column as values
pivot_table = mean_survival.pivot(index='Pclass', columns='Sex', values='Survived')

# Plot the pivot table as a bar plot
pivot_table.plot(kind='bar')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')
plt.title('Survival Rates by Passenger Class and Gender')
plt.legend(title='Sex')
plt.show()
Plot of survival rate by passenger class and genderPlot of survival rate by passenger class and gender

In the case of creating charts, I particularly appreciated the auto-run feature in Einblick, as I got the results of my data visualizations immediately.

Jupyter AI Output

To plot the survival rate by Pclass and gender, you can use the following code:
import pandas as pd
import matplotlib.pyplot as plt

# Assume you have a DataFrame called "df" with the columns: Pclass, Sex, Survived

# Group the DataFrame by Pclass and Sex, and calculate the survival rate
survival_rate = df.groupby(['Pclass', 'Sex']).mean()['Survived']

# Plot the survival rate
fig, ax = plt.subplots()
survival_rate.unstack().plot(kind='bar', ax=ax)
ax.set_xlabel('Pclass')
ax.set_ylabel('Survival Rate')
ax.set_title('Survival Rate by Pclass and Gender')

# Show the plot
plt.show()

This code groups the DataFrame df by 'Pclass' and 'Sex' using the groupby() function, and calculates the mean survival rate using the mean() function. Then, it plots the survival rate using a bar plot. The x-axis represents 'Pclass', and the y-axis represents the survival rate. The title and axis labels are also set accordingly. The final line of code shows the plot using plt.show()

At this point, I had cleaned the data a decent amount, so I had to be careful that I was referencing the correct dataframe since Jupyter AI is not context-aware, and could not infer a lot of information about my dataset.

Prompt 2: Plot distributions of several variables

Create separate plots showing the distributions of Age, Pclass, and Embarked

Einblick Prompt Output

# Required Libraries
import seaborn as sns
import matplotlib.pyplot as plt

# Create a histogram for 'Age' column
sns.histplot(data=titanic, x='Age')
plt.show()

# Create a bar plot for 'Pclass' column
sns.countplot(data=titanic, x='Pclass')
plt.show()

# Create a bar plot for 'Embarked' column
sns.countplot(data=titanic, x='Embarked')
plt.show()

Output:

Three charts generated by Einblick PromptThree charts generated by Einblick Prompt

Jupyter AI Output

To create separate plots showing the distributions of Age, Pclass, and Embarked, you can use the following code:
import pandas as pd
import matplotlib.pyplot as plt

# Assume you have a DataFrame called "df" with the columns: Age, Pclass, and Embarked

# Create subplots for each distribution
fig, axes = plt.subplots(3, 1, figsize=(8, 12))

# Plot the distribution of Age
axes[0].hist(df['Age'].dropna(), bins=20, edgecolor='black')
axes[0].set_xlabel('Age')
axes[0].set_ylabel('Count')
axes[0].set_title('Distribution of Age')

# Plot the distribution of Pclass
axes[1].hist(df['Pclass'], bins=3, edgecolor='black', align='left', rwidth=0.8)
axes[1].set_xlabel('Pclass')
axes[1].set_ylabel('Count')
axes[1].set_title('Distribution of Pclass')

# Plot the distribution of Embarked
axes[2].bar(df['Embarked'].value_counts().index, df['Embarked'].value_counts().values, edgecolor='black')
axes[2].set_xlabel('Embarked')
axes[2].set_ylabel('Count')
axes[2].set_title('Distribution of Embarked')

# Adjust subplots spacing
plt.subplots_adjust(hspace=0.5)

# Show the plots
plt.show()
This code creates separate subplots for each distribution using plt.subplots(). Then, for each subplot, it plots the distribution using appropriate plotting functions, such as hist() for Age and Pclass, and bar() for Embarked. It sets the axis labels and titles accordingly. The subplots are adjusted to have proper spacing using plt.subplots_adjust(). Finally, the plots are shown using plt.show().

Use-case: Building ML models

Use the titanic dataset to predict survival

This was the use-case where auto-running code and context-awareness was the most useful in Einblick Prompt. Prompt did additional preprocessing steps of the data, which was really great to see! There are certainly other ways of handling missing data, but it was satisfying to get a result back on the first model immediately.

While Jupyter AI was able to identify the different steps necessary, it did not preprocess the data automatically, and because of how its response is formatted, I had to copy-paste several different blocks of code, which were interspersed with Markdown-style text. This was pretty time-consuming. The alternative was using the --format code option, which took out the comments, but completely ignored the preprocessing step altogether, resulting in code that errored out.

Einblick Prompt Output

# Split the 'titanic' dataframe into features (X) and target variable (y)
X = titanic.drop('Survived', axis=1)
y = titanic['Survived']

# Drop rows from X and y where there are missing values
X = X.dropna()
y = y[X.index]

# Encode categorical variables in the features dataframe (X)
X_encoded = pd.get_dummies(X)

# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=42)

# Instantiate a machine learning model (e.g., Logistic Regression, Random Forest, etc.)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

# Fit the model to the training data
model.fit(X_train, y_train)

# Evaluate the model on the testing data
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)

Output:

Accuracy: 0.7692307692307693

Jupyter AI Output

To predict survival using the df_dropped dataset, you will need to use a machine learning model. Here is a general outline of the steps you can follow:
1. Split the dataset into features (X) and target variable (y). Assuming the target variable is the 'Survived' column, you can use the following code:

X = df_dropped.drop('Survived', axis=1)
y = df_dropped['Survived']

2. Preprocess the data. This step includes handling missing values, converting categorical variables into numerical representations, and scaling the features if necessary. You can use techniques such as one-hot encoding, label encoding, and imputation.
3. Choose a suitable machine learning algorithm for classification, such as Logistic Regression, Random Forest, or Support Vector Machine. Instantiate the model:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate the Logistic Regression model
model = LogisticRegression()

4. Train the model using the training data:

model.fit(X_train, y_train)

5. Predict the target variable for the test set:

y_pred = model.predict(X_test)

6. Evaluate the performance of the model using appropriate evaluation metrics, such as accuracy, precision, recall, or the confusion matrix.

from sklearn.metrics import accuracy_score, confusion_matrix

accuracy = accuracy_score(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)

7. It is crucial to note that the above code is a general framework, and the choice of preprocessing techniques and the machine learning model will depend on the specific problem and dataset.
8. Print the evaluation results in markdown format:

print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix: {confusion_mat}")

This code will print the accuracy and confusion matrix of the model's performance in markdown format.
Remember to import the necessary libraries for the specific preprocessing techniques and machine learning algorithms you choose to use.

Fixing Errors

Einblick Prompt

Screenshot highlighting fix with prompt button in EinblickScreenshot highlighting fix with prompt button in Einblick

In Prompt, whenever users hit a Python error, they can just click the "Fix with prompt" button to fix the error. Prompt will automatically update the code cell in question, and will give the user feedback on how it changed the cell.

Jupyter AI

There is no way to fix code using Jupyter AI. But you can have the AI explain the error to you. According to the official blog, you use the following syntax:

# The number in the [] should correspond with the cell number where the error occurred
%%ai chatgpt
Explain the following Python error to me
  {Err[25]}

I tried it on the following error, which occurred in cell 25, but to no avail. I actually had to change the number to 26 to get the code explanation.

Example of error in Jupyter AIExample of error in Jupyter AI
Example of Jupyter AI error explanationExample of Jupyter AI error explanation

While the error explanation is good, it's unclear why index 25 did not work in the first place.

Conclusion

Jupyter AI offers direct access to a ChatGPT-like interface within Jupyter notebooks. However, the AI is limited and a bit clunky in terms of speeding up the creation of data workflows. Although you are no longer tab-switching, you still need to copy-paste code generated and fiddle with the code to match your particular dataset. There is no direct way to fix code, but the code explanations are helpful, if a bit buggy. Additionally, you have no guarantees that Jupyter AI is taking into account the context of your data, such as properly formatted column names or dataframe names.

Einblick Prompt, on the other hand, leverages the immense power of OpenAI's technology specifically for the data science and data analytics domain. Prompt achieves this by analyzing the user's actions on a canvas. Utilizing this context, Einblick Prompt provides highly accurate and tailored answers that align with the user's specific needs and requirements. Prompt is able to reason which dataframe to use out of the many that may be in a notebook, and is able to auto-run code, and fix any issues with one click of a button.

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.

Start using Einblick

Pull all your data sources together, and build actionable insights on a single unified platform.

  • All connectors
  • Unlimited teammates
  • All operators