Box plots or box-and-whisker plots are particularly useful in comparing distributions of continuous variables across groups, and identifying outliers. In this post, we’ll use seaborn’s boxplot() function to create and customize different box plots.
Histograms are a key visualization tool that help show the distribution of numerical data. Some histograms are easier than others to customize. This post will go over some of the many ways you can use seaborn’s histplot() function to create highly tuned and beautiful histograms.
Learn how to create a histogram in matplotlib, a powerful data visualization library in Python that backs other libraries, such as seaborn. Check out our step-by-step tutorial to get started.
In this post, we'll use matplotlib and seaborn together to create customized, beautiful axis labels, axis tick labels, and titles for your plots so that your data can speak for itself.
In this post, we’ll provide a comprehensive guide on seaborn’s scatterplot() function. We’ll cover a few key arguments, including hue, style, palette, and size that will help you create more compelling graphs.
In this post, we review the basics of seaborn’s countplot() function. We’ll cover a few key arguments, including data, x, y, order, and hue, as well as a few plot examples.
This post will go over how to effectively visualize data using seaborn’s built-in lineplot() function. There are many parameters you can use to craft a more comprehensive data story, such as hue, style, markers, errorbar, err_style, and legend.
In data analysis, finding the global minimum of a function is a common task. However, it can be challenging to find the optimal solution due to the presence of multiple local minima. In this tutorial, we provide an example of using the scipy.optimize.basinhopping() function to find the global minimum of a one-dimensional multimodal function.
Learn how to perform constrained optimization using the scipy.optimize.minimize function. Get the best solution to your optimization problem while taking into consideration specific constraints on the solution.
In this tutorial, we'll explore how to minimize a function using the scipy.optimize.minimize function. By using this function, you can find the minimum value of a function, which is useful for optimization problems. We'll guide you through the steps of defining an objective function and key function arguments.
If your simple linear regression model exhibits heteroscedasticity, you can adjust the model to account for it in several ways. One way is to use weighted least squares (WLS) regression, which allows you to specify a weight for each data point. Check out this example using randomly generated data and the statsmodels library.
Learn how to create a scatter plot in Matplotlib, a powerful data visualization library in Python. Get step-by-step instructions on how to visualize your data.
Decorators are a powerful and flexible feature of Python that allow you to modify the behavior of a function or method without modifying the base function’s underlying code or repeating the same code over and over again. In this post, we’ll go over basic syntax and an example that evaluates code performance.
There are many ways to check that there is constant variance of errors across values of the X variables in a regression model. This post will go over a visual way to check for homoscedasticity or to diagnose heteroskedasticity, using residual plots after you’ve built your linear regression model.
Ordinary least squares (OLS) is one of the classic regression techniques for a reason–the results are highly interpretable, but we have to ensure key model assumptions are met. This post will cover how to run the Breusch-Pagan test for heteroskedasticity using the statsmodels package.
Avoid unstable and unreliable model coefficients with this comprehensive guide to checking for multicollinearity in Python using seaborn and statsmodels. Learn about multicollinearity and how to use the variance inflation factor (VIF) and correlation coefficients.
Testing for heteroskedasticity (with a "k" or "c") is essential when running various regression models. For example, one of the main assumptions of OLS is that there is constant variance (homoscedasticity) among the residuals or errors of your linear regression model. Learn how to run and interpret White's test for heteroskedasticity using statsmodels.
In this post, we’ll review seaborn’s catplot() function, which is helpful for creating different kinds of plots to help you analyze and understand the relationships between continuous and categorical variables. We’ll go over how to use catplot() and some tips for customizing the appearance and layout of your plots.
In this post, we’ll be going over two ways to perform linear regression using ordinary least squares (OLS) estimation using the statsmodels library. Get a detailed summary of your model fit and access useful summary statistics with these simply functions.
In the following example, we create a day_of_week() function to demonstrate the use of the match and case statements, Python's equivalent to the switch statement.
This code demonstrates how to use the ProcessPoolExecutor and ThreadPoolExecutor classes from the concurrent.futures module to run multiple threads and processes concurrently or in parallel to save you time.
In this article, we will look at an example of how to use vectorized operations instead of for loops in Python to save time.
NumPy arrays are stored in contiguous blocks of memory, which allows NumPy to take advantage of vectorization and other optimization techniques. Python lists are stored as individual objects in memory, which makes them less efficient and performant than NumPy arrays for numerical data.
One useful but not well-understood Python tip for data science is the use of generator expressions. Generator expressions are similar to list comprehensions, but they are more memory efficient because they do not create a new list object in memory.
Caching is a technique for storing the results of expensive computations so that they can be quickly retrieved later. In Python, you can actually use functools.lru_cache(), which stands for least recently used (LRU) cache, to easily add caching to a function.
This handy tool allows you to efficiently add and remove items from the beginning or end of a list, making it a valuable addition to your Python toolkit.
Fast Einblick Tools to make data manipulation faster. This first Tool series explores a sequence of Concat, Sort, and Join operations to manipulate and enrich customer data.
Here's a quick guide to how you can use sqlite in Python and how to load Pandas dataframes to SQL, manipulate that data, and read it back.