Caching is a technique for storing the results of expensive computations so that they can be quickly retrieved later. In Python, you can actually use
functools.lru_cache(), which stands for least recently used (LRU) cache, to easily add caching to a function. The
functools library has other functions as well, which can add a lot of value to your code.
lru_cache basic syntax
from functools import lru_cache # maxsize default is 128, and is the max number of calls saved @lru_cache(maxsize=128) def expensive_function(x): # Do some expensive computation here result = x return result # The first time this function is called, it will perform the expensive computation # and store the result in a cache. Subsequent calls with the same argument will # return the cached result, avoiding the need to perform the computation again. result = expensive_function("hello world") print(result)
@lru_cache decorator takes a
maxsize argument that specifies the maximum number of recent results to store in the cache. If
None, the cache can just keep growing. The default is 128.
Next we'll go through a more concrete example, and compare performance with and without caching. Full code in canvas, and reproduced below.
lru_cache example: calculate n!
Define non-cached and cached functions
from functools import lru_cache # Naive factorial calculation def fact_n(n): if n < 2: return 1 else: return n * fact_n(n-1) # Cached factorial calculation @lru_cache(maxsize=128) def fact_n_cache(n): if n < 2: return 1 else: return n * fact_n_cache(n-1)
Compare performance using
from time import perf_counter # Calculate factorial naively, no caching start = perf_counter() fact_n(50) end = perf_counter() print("No Caching") print("Time elapsed: " + str(end - start)) # Calculate factorial, with caching start = perf_counter() fact_n_cache(50) end = perf_counter() print("Cached - 1st run") print("Time elapsed: " + str(end - start)) # Calculate factorial, with caching again start = perf_counter() fact_n_cache(50) end = perf_counter() print("Cached - 2nd run") print("Time elapsed: " + str(end - start))
No Caching Time elapsed: 7.004104554653168e-05 Cached - 1st run Time elapsed: 7.761106826364994e-05 Cached - 2nd run Time elapsed: 4.1250837966799736e-05
As you can see, running the cached function the second time, nearly cut the execution time in half.
Caching can be an effective way to improve the performance of your code, but it's important to use it wisely. There is no specific amount of cache that is "too much." The amount of cache that is appropriate for your code will depend on several factors, including the size and complexity of your data, the amount of memory on your system, and the performance characteristics of your code.
A larger cache can help improve the performance of your program by allowing it to store more results and avoid repeating expensive computations. However, a very large cache can also consume a significant amount of memory, which can slow down your code or cause it to crash.
As a rule of thumb, try to balance the size of your cache and the amount of memory it uses. If you notice that your code is running slowly or using a lot of memory, you may need to reduce the size of your cache. If you have a lot of memory available and your code is not running as quickly as you would like, increase the size of your cache.
Einblick is an agile data science platform that provides data scientists with a collaborative workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick customers include Cisco, DARPA, Fuji, NetApp and USDA. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.