If you’ve learned any kind of computer science or worked with any general purpose programming language, like Python, you’ve probably learned about for loops. For loops offer some of the most basic logic in programming, and let you repeat processes easily without writing redundant code. Although a fundamental building block of computer science, as your data gets bigger, for loops can be slow for computationally intensive operations because of how they work under the hood.
Thankfully, the popular Python library NumPy leverages a process called vectorization so that you can wean off of for loops, and optimize your resources when working with numerical data. Vectorization takes an algorithm, or a set of operations, and instead of operating on one element or value at a time, works on a multiple values at the same time.
The vectorized operations rely on NumPy’s core object, the NumPy array or ndarray
, which can only work with numerical data. But using NumPy’s native operations when applicable can save you an exponential amount of time.
In this article, we will look at an example of how to use vectorized operations instead of for loops in Python to save time. If you just want to jump into the canvas, you can open it up below. Otherwise, we'll walk you through the code line-by-line in the following section.
We start by importing numpy
and time
. The latter will allow us to compare the process time taken by the for loop versus the vectorized NumPy operation.
import numpy as np
import time
# create 2 random arrays of integers
arr = np.random.randint(low=0, high=10, size=(10, 10))
arr2 = np.random.randint(low=0, high=10, size=(10, 10))
Then, we created 2 random 2-D NumPy arrays, and for one of them, arr
, we use a for loop to add 10 to each element in the array.
To calculate process time, we use the process_time()
function from the time module.
# start timer
st = time.process_time()
# using for loops to add 10 to each element in the array
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
arr[i, j] += 10
# get the end time
et = time.process_time()
# get execution time
exec_time = et - st
print('CPU Execution time: {:.3e} seconds'.format(exec_time))
Output:
CPU Execution time: 1.529e-04 seconds
For the other, we use the built-in +
operation in numpy
to add 10 to each element.
# start timer
st = time.process_time()
# using vectorized operations to add 10 to each element in the array
arr2 = arr2 + 10
# get the end time
et = time.process_time()
# get execution time
exec_time = et - st
print('CPU Execution time: {:.3e} seconds'.format(exec_time))
Output:
CPU Execution time: 7.816e-05 seconds
As you can see from the output, the vectorized operation is significantly faster! Imagine how big of a difference this will make on a much larger dataset.
The advantage of using vectorized operations is that they can take advantage of how NumPy was built. In the case of the code snippet above, using vectorized operations is significantly faster and more efficient than using for loops. NumPy has other functions that take advantage of parallelism through vectorization and NumPy's unique architecture, such as np.arange()
and np.vectorize()
.
You can read more about vectorized computation and Numpy arrays via O’Reilly or dive into NumPy’s documentation.
About
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.