In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces the method of implementing caching in Python applications, which has certain reference value. Friends who need it can refer to it. Take a look at it with me.
Importance of cache
Caching is an important concept for every Python programmer to understand.
In short, the concept of caching is primarily about using programming techniques to store data in temporary locations rather than retrieving data from the source each time.
Caching can then improve application performance because accessing data from temporary locations is faster than fetching data from sources such as databases, web services, and so on each time.
This article aims to explain how caching works in Python.
Why do we need to implement caching?
To understand what caching is and why it is needed, consider the following scenario.
We are building an application in Python that will display a list of products to end users. The app is accessed several times a day by more than 100 users. The application will be hosted on an application server and accessible over the internet. The product will be stored in a database that will be installed on the database server. Therefore, the application server queries the database for relevant records.
The following illustration demonstrates how our target application is set up:
problem
Getting data from a database is an io-bound operation. Therefore, its nature is slow. If requests are sent frequently and responses are updated infrequently, we can cache responses in the application's memory.
Instead of querying the database every time, we can cache the results as follows:
Requests for data must go through the wire, and responses must go back through the wire.
This is slow in nature. Therefore, caching was introduced.
We can cache results to reduce computation time and save computer resources.
A cache is a temporary storage location. It works in a lazy loading mode.
Initially, the cache is empty. When the application server fetches data from the database server, it populates the cache with the desired dataset. From that point on, subsequent requests will fetch data from the cache rather than all the way to the application server.
We also need to invalidate the cache in a timely manner to ensure that up-to-date information is displayed to the end user.
This leads to the next section of this article: Cache rules.
caching rules
In my opinion, caching has three rules.
Before enabling caching, we need to perform the critical steps of profiling the application.
Therefore, the first step before introducing caching in your application is to profile your application. Only then can we know how long each function takes and how many times it is called. After the analysis process is complete, we need to determine what needs to be cached.
We need a mechanism to connect the inputs and outputs of the function and store them in memory. This leads to the first rule of caching.
1. The first rule of caching:
The first rule is to ensure that the objective function takes a long time to return output, that it is executed frequently, and that the output of the function does not change frequently.
We don't want to introduce caching for functions that don't take long to complete, functions that are rarely called in an application, or functions that return results but change frequently in source code.
This is an important rule to remember.
Caching candidates: frequently called functions, output that doesn't change often, and execution that takes a long time
As an example, if a function is executed 100 times, and the function takes a long time to return a result, and it returns the same result for a given input, then we can cache the result.
However, if the value returned by a function is updated every second at the source to get the request executed every minute then it really matters to understand whether we need to cache the results to eventually send stale data to the user. This helps us understand whether we need caching or whether we need different communication channels, data structures, or serialization mechanisms to retrieve data faster, such as by sending data using binary serializers over sockets rather than xml serialization over http.
It is also important to know when to invalidate the cache and when to reload the cache with new data.
2. Second rule:
The second rule is to ensure that fetching data from the introduced caching mechanism is faster than executing the objective function.
We should introduce caches only if the results are retrieved from the cache faster than the data is retrieved from the data source.
Caching should be faster than fetching data from the current data source
Therefore, it is critical to choose the appropriate data structure (such as a dictionary or LRU cache) as an instance.
3. Third rule:
The third important rule is about memory usage, which is often overlooked. Are you performing IO operations (such as querying databases, web services) or CPU-intensive operations (such as computing numbers and performing memory calculations)?
When we cache results, the memory footprint of the application increases, so it is critical to choose the appropriate data structure and cache only the data attributes that need to be cached.
Sometimes we query multiple tables to create objects of a class. However, we only need to cache basic properties in the application.
Cache affects memory footprint
As an example, consider that we built a report dashboard that queries the database and retrieves a list of orders. For illustrative purposes, consider that the dashboard displays only the order name.
Therefore, instead of caching the entire order object, we can cache only the name of each order. In general, architects recommend creating a lean data transfer object (DTO) with the__slots__attribute to reduce memory footprint. Named tuples or Python data classes are also used.
This leads to the final section of this article, which outlines the details of how caching is implemented.
How to implement caching?
There are several ways to implement caching.
We can create local data structures in Python processes to build caches, or we can use caches as servers, acting as proxies and servicing requests.
There are built-in Python tools, such as using the cached_property decorator in the functools library. I want to introduce cache implementations by providing an overview of cache decorator properties.
The following code snippet illustrates how cache properties work.
from functools import cached_propertyclass FinTech: @cached_property def run(self): return list(range(1,100))
As a result, FinTech().run is now cached, and the output of range(1100) will be generated only once. However, in real scenarios, we rarely need to cache attributes.
Let's review other methods.
1. dictionary method
For simple use cases, we can create/use mapping data structures such as dictionaries that we can keep in memory and make accessible on the global framework.
There are many ways to achieve it. The easiest way to do this is to create a singleton style module, such as config.py
In configuration. We can create a dictionary type field, populated once at the beginning. From then on, dictionary fields can be used to get results.
2. Most recently used algorithms
We can use Python's built-in feature LRU.
LRU represents the least recently used algorithm. LRU caches the return values of functions that depend on the parameters passed to the function.
LRU is particularly useful in recursive CPU binding operations.
It is essentially a decorator:@lru_cache(maxsize, typed) that we can use to decorate functions.
maxsize tells the decorator the maximum size of the cache. If we don't want to set the size, then just set it to None.
typed is used to indicate whether output is to be cached as the same value that can compare values of different types.
This is valid when we expect the same input to produce the same output.
Keeping all the data in the application's memory can be troublesome.
In distributed applications with multiple processes, this can be a problem because it is not appropriate to cache all results in memory for all processes.
A good use case is when an application runs on a cluster of machines. We can host caching as a service.
3. Cache as a Service
A third option is to host cached data as an external service. The service can be responsible for storing all requests and responses.
All applications can retrieve data through caching services. It's like a proxy.
Suppose we are building an application the size of Wikipedia that will serve 1000 requests simultaneously or in parallel.
We need a caching mechanism and want to distribute caching across servers.
We can use memcache and cache data.
Memcached is popular in Linux and Windows because:
It can be used to implement a memory cache with state.
It can even be distributed across servers.
It is simple to use, fast, and widely used in many large organizations.
It supports automatic expiration of cached data
We need to install a python library called pymemcache.
Memcache requires data to be stored in string or binary form. Therefore, we must serialize cached objects and deserialize them when we need to retrieve them.
The code snippet shows how to start and use memcache:
client = Client(host, serialiser, deserialiser)client.set('blog': {'name':'caching', 'publication':'fintechexplained'}}blog = client.get('blog')
The above is the details of how to implement caching in Python applications. Do you have any gains after reading it? If you want to know more, welcome to the industry news!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.