Gigasheet + Einblick: Bridging the gap between business users and data scientist

Paul Yang - April 6th, 2023

As the amount of data generated by businesses continues to increase exponentially, the ability to effectively analyze and make sense of this data has become a key factor in driving success. However, there is often a gap between business users, who often have firsthand working knowledge of the data, and data scientists, who are responsible for developing the models and algorithms used to extract insights from the data.

This is where the combination of Gigasheet's big data spreadsheet and data preparation tools and Einblick's multiplayer Python notebooks can prove to be a game changer. By integrating these two tools, businesses can bridge the gap between their business analysts and data scientists, allowing them to work together more quickly and effectively.

Gigasheet's intuitive spreadsheet-like interface allows business analysts to easily manipulate, clean, and prepare huge datasets without the need for complex programming skills. Einblick's innovative Python notebooks, on the other hand, provide data scientists with a collaborative, cloud-based platform for developing and sharing code and models. When the two come together, the real magic happens.

Best of all, both Einblick and Gigasheet offer free tiers.

Business users can leverage Gigasheet to prepare and clean their data on their own, before sharing it with data scientists who can then use Einblick's deep data science capabilities to extract insights from the data. This seamless collaboration cuts out the back-and-forth between the two teams yielding rapid, accurate insights, as well as increased efficiency and productivity for the business.

Jump to the end or click here to watch a walkthrough, or read on for the step-by-step.

Inventory + Forecasting Let’s Do It

In this example, the inventory team wants to work with the data science team to develop a forecast for our best-selling product. At the same time, we also have the ability to leverage the cleansed data for other applications including sharing with marketing and anybody else as well.

Data Prep In Gigasheet

We’ll start with 2 files from separate different online order management systems that span several years and millions of transactions. In this case we’ve recently migrated between systems and have some overlap between our data sets. Further complicating matters, the data is exported in different formats. It was originally in XLS, but more recently we’ve moved to an order management system which exports the data as JSON. In total all of these files amount to just over 1GB of data, and more than 4.8 million rows – far beyond the row limits of Excel or Google Sheets.

You can explore the data files we’re using here: https://app.gigasheet.com/datasets?folder=ef661987_54bf_4959_b7f3_b4de481c7655.

Under typical circumstances a marketing analyst would need help from a data team to prepare this data for analysis, but with Gigasheet this is as easy as using a spreadsheet.

First we can see the files contain sales transaction data. The legacy online sales system previously used exported data in Excel, and this data totals to about 440k rows. The current online sales system exports transaction data in JSON. It’s also worth noting that there is some overlap between the two systems, as they were run in parallel during the cutover process. We want to make sure that we aren’t counting duplicate transactions in our data.

Next we see we want to combine these files, but to do that the columns must match. After upload the JSON we see Gigasheet automatically flattens the JSON to a tabular CSV. We can easily compare the columns between the Excel file and the JSON, and remove the columns that don’t match from the JSON.

Once the column order and data types match, we can use the Combine Files function in the Library to create a unified data set. Now we can review this consolidated data and use the Remove Duplicates data cleanup tool to remove any duplicated rows of transaction data. And finally we’ll filter to just orders in the relevant time period for this project.

You can see the prep’d data here: https://app.gigasheet.com/spreadsheet/Merged-Data/1487a8ad_7fa3_4527_acdd_6ebeb1001dc3

Now the data is ready for the data science team to help us develop a forecast of our upcoming sales.

Data Science In Einblick

Now that the data is ready for analysis by the data science team. Einblick’s multiplayer Python notebooks make it easy for the data science team to:

  • Install the Python package from the Gigasheet Github repository
  • Call the Gigasheet API to generate a downloadable link
  • Read data from a zipped export file into a Pandas dataframe
  • Make a few charts and a table
  • Build a forecast of upcoming sales volumes
  • Use SQL and Python to transform the data, before writing a new file to Gigasheet

Check out the interactive Einblick Canvas below to see how it works:

Coding Blocks

Installing Packages and Getting Started

## Download and install the Gigasheet API Python wrapper
!pip install -r https://raw.githubusercontent.com/gigasheetco/gigasheet-python/main/requirements.txt
!pip install git+https://github.com/gigasheetco/gigasheet-python.git

## Import packages needed
from gigasheet import gigasheet
import urllib
import io

## Start using the API!

giga = gigasheet.Gigasheet(GIGASHEET_API_KEY) ## Replace this with your API key / in the Einblick secrets tab

Downloading Data

sheet_handle = GIGASHEET_HANDLE ## Set your handle here -- you can find the handle in the URL of the sheet

# Create an export using the giga API
export_handle = giga.create_export_current_state(sheet_handle)
giga.wait_for_file_to_finish(export_handle)

# Get export URL
url = giga.download_export(export_handle)
print('Presigned URL:')
print(url)


## Now bring that file into a pandas dataframe
import zipfile
response = urllib.request.urlopen(url)
zipfile_bytes = io.BytesIO(response.read())
zipfile_bytes.seek(0)
with zipfile.ZipFile(zipfile_bytes) as zf:
csv_data = zf.read(zf.namelist()[0]).decode('utf-8')


# Convert CSV data to DataFrame
df = pd.read_csv(io.StringIO(csv_data))

Writing a File to Gigasheet

# Generate a flat file from a data frame named `df_out` in the working directory
csv_path = forecast.csv'
df_out.to_csv(csv_path, index=False)

# Upload the CSV file to Gigasheet
handle = giga.upload_file(csv_path, csv_path)
print(f'Uploaded file with handle: {handle}')

Wrapping Up

The integration of Gigasheet's spreadsheet-like data prep tools with Einblick's Python notebooks can greatly improve the efficiency and effectiveness of business and data science teams. By bridging the gap between business analysts and data scientists, this integration enables a more collaborative and streamlined approach to data analysis, resulting in faster and more accurate insights. With these tools at their disposal, businesses can make better-informed decisions and gain a competitive advantage in today's data-driven world. The best part is you can get started right away - both capabilities are free!

Sign up for Einblick today to get started: https://app.einblick.ai/?loginMethod=signup

Learn more about Gigasheet on their website, and get 3GB free: https://app.gigasheet.com/signup