Caching with lru_cache() in Python

Einblick Content Team - December 14th, 2022

Caching is a technique for storing the results of expensive computations so that they can be quickly retrieved later. In Python, you can actually use functools.lru_cache(), which stands for least recently used (LRU) cache, to easily add caching to a function. The functools library has other functions as well, which can add a lot of value to your code.

lru_cache basic syntax

from functools import lru_cache

# maxsize default is 128, and is the max number of calls saved
@lru_cache(maxsize=128)
def expensive_function(x):
    # Do some expensive computation here
    result = x
    return result

# The first time this function is called, it will perform the expensive computation
# and store the result in a cache. Subsequent calls with the same argument will
# return the cached result, avoiding the need to perform the computation again.

result = expensive_function("hello world")
print(result)

The @lru_cache decorator takes a maxsize argument that specifies the maximum number of recent results to store in the cache. If maxsize is None, the cache can just keep growing. The default is 128.

Next we'll go through a more concrete example, and compare performance with and without caching. Full code in canvas, and reproduced below.

lru_cache example: calculate n!

Define non-cached and cached functions

from functools import lru_cache

# Naive factorial calculation
def fact_n(n):
    if n < 2:
        return 1
    else: 
        return n * fact_n(n-1)


# Cached factorial calculation
@lru_cache(maxsize=128)
def fact_n_cache(n):
    if n < 2:
        return 1
    else: 
        return n * fact_n_cache(n-1)

Compare performance using time.perf_counter()

from time import perf_counter

# Calculate factorial naively, no caching
start = perf_counter()
fact_n(50)
end = perf_counter()
print("No Caching")
print("Time elapsed: " + str(end - start))

# Calculate factorial, with caching
start = perf_counter()
fact_n_cache(50)
end = perf_counter()
print("Cached - 1st run")
print("Time elapsed: " + str(end - start))

# Calculate factorial, with caching again
start = perf_counter()
fact_n_cache(50)
end = perf_counter()
print("Cached - 2nd run")
print("Time elapsed: " + str(end - start))

Output:

No Caching
Time elapsed: 7.004104554653168e-05
Cached - 1st run
Time elapsed: 7.761106826364994e-05
Cached - 2nd run
Time elapsed: 4.1250837966799736e-05

As you can see, running the cached function the second time, nearly cut the execution time in half.

Additional notes

Caching can be an effective way to improve the performance of your code, but it's important to use it wisely. There is no specific amount of cache that is "too much." The amount of cache that is appropriate for your code will depend on several factors, including the size and complexity of your data, the amount of memory on your system, and the performance characteristics of your code.

A larger cache can help improve the performance of your program by allowing it to store more results and avoid repeating expensive computations. However, a very large cache can also consume a significant amount of memory, which can slow down your code or cause it to crash.

As a rule of thumb, try to balance the size of your cache and the amount of memory it uses. If you notice that your code is running slowly or using a lot of memory, you may need to reduce the size of your cache. If you have a lot of memory available and your code is not running as quickly as you would like, increase the size of your cache.

BONUS: @lru_cache with generative AI

We recently launched our AI agent, called Einblick Prompt, which can create data workflows from as little as one sentence--now available in every Einblick canvas. In the below canvas, we used generative AI to create a function and cache it using lru_cache. Check out how we did it below:

Using Generative AI in Einblick

  1. Open the canvas
  2. Fork the canvas
  3. Right-click anywhere in the canvas > Prompt
  4. Type in: "Write a function to calculate n!. Use lru cache to cache the function."
  5. Run the code in Einblick's data notebook immediately

Give Prompt a whirl, and let us know what you think!

About

Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.