Tools and Widgets

Three multicollinearity tests in Python

Einblick Content Team - January 20th, 2023

Avoid unstable and unreliable model coefficients with this comprehensive guide to checking for multicollinearity in Python using seaborn and statsmodels. Learn about multicollinearity and how to use the variance inflation factor (VIF) and correlation coefficients.

White's test for heteroskedasticity in Python

Einblick Content Team - January 17th, 2023

Testing for heteroskedasticity (with a "k" or "c") is essential when running various regression models. For example, one of the main assumptions of OLS is that there is constant variance (homoscedasticity) among the residuals or errors of your linear regression model. Learn how to run and interpret White's test for heteroskedasticity using statsmodels.

Categorical plots with seaborn catplot()

Einblick Content Team - January 11th, 2023

In this post, we’ll review seaborn’s catplot() function, which is helpful for creating different kinds of plots to help you analyze and understand the relationships between continuous and categorical variables. We’ll go over how to use catplot() and some tips for customizing the appearance and layout of your plots.

BeautifulSoup: Python web scraping library

Paul Yang - January 6th, 2023

BeautifulSoup is a Python package designed for parsing HTML and turning the markup code into something navigable and searchable. Easy scraping can improve your life tremendously: here, I was using it to assemble a list of on-sale wines at my local wine store. We also use the Requests package to grab the URL (taking bets on when requests going to be baked in).

Ordinary Least Squares (OLS) in statsmodels

Einblick Content Team - January 5th, 2023

In this post, we’ll be going over two ways to perform linear regression using ordinary least squares (OLS) estimation using the statsmodels library. Get a detailed summary of your model fit and access useful summary statistics with these simply functions.

Python in parallel: ThreadPoolExecutor and ProcessPoolExecutor

Einblick Content Team - December 22nd, 2022

This code demonstrates how to use the ProcessPoolExecutor and ThreadPoolExecutor classes from the concurrent.futures module to run multiple threads and processes concurrently or in parallel to save you time.

np.arange() in Python: More Efficient than Lists

Einblick Content Team - December 16th, 2022

NumPy arrays are stored in contiguous blocks of memory, which allows NumPy to take advantage of vectorization and other optimization techniques. Python lists are stored as individual objects in memory, which makes them less efficient and performant than NumPy arrays for numerical data.

Python Generator Expression: An Introduction

Einblick Content Team - December 15th, 2022

One useful but not well-understood Python tip for data science is the use of generator expressions. Generator expressions are similar to list comprehensions, but they are more memory efficient because they do not create a new list object in memory.

Caching with lru_cache() in Python

Einblick Content Team - December 14th, 2022

Caching is a technique for storing the results of expensive computations so that they can be quickly retrieved later. In Python, you can actually use functools.lru_cache(), which stands for least recently used (LRU) cache, to easily add caching to a function.

Python OpenAI Text Generator Example

Paul Yang - December 2nd, 2022

OpenAI has released a powerful API to use with their pre-trained models. This includes generative AI solutions like text completion and natural language, without the need to train models locally or work with heavyweight machines. This canvas example is designed to show you how to get started in Python.

Faster Data Manipulation: Pandas Concat

Benedetto Buratti - November 18th, 2022

Fast Einblick Tools to make data manipulation faster. This first Tool series explores a sequence of Concat, Sort, and Join operations to manipulate and enrich customer data.

Use Python to Grab Twitter Data: Easiest Guide to Tweepy API

Paul Yang - November 10th, 2022

Getting Twitter data into your Python analysis is easy with the use of the Tweepy API. In this Tools post, we cover the crash course on how to find tweets related to a given hashtag, and pull it in (and how to do a quick sentiment analysis).

Python sqlite3: How to use sqlite with dataframes

Paul Yang - November 9th, 2022

Here's a quick guide to how you can use sqlite in Python and how to load Pandas dataframes to SQL, manipulate that data, and read it back.

Using the Reddit Python API to Generate Datasets

Paul Yang - October 14th, 2022

Use the Pushshift API and Reddit API in order to create novel datasets pulling Reddit data into Python data frames. Easily transition to NLP and ML analysis of the Reddit data sets as well.

Upload a CSV to Snowflake (or any Pandas dataframe)

Paul Yang - September 25th, 2022

Let's face it: the Snowflake web uploader is painful to use. Here's my script to take a CSV or the results of a Python notebook, and write it to your Snowflake database.

Resources
Website Data Collection