In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "what is the principle of Java cache update". In daily operation, I believe many people have doubts about the principle of Java cache update. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what is the principle of Java cache update?" Next, please follow the editor to study!
The routine of caching updates
See that many people delete the cache and then update the database when writing the code to update the cached data, and the subsequent operations will put the data back in the loaded cache. However, this is logically wrong. Just imagine, there are two concurrent operations, one is the update operation, the other is the query operation. After the update operation deletes the cache, the query operation does not hit the cache. The old data is first read out and put into the cache, and then the update operation updates the database. As a result, the data in the cache is still old, causing the data in the cache to be dirty, and it has been dirty all the time.
I don't know why so many people use this logic, but when I posted this on Weibo, I found that many people have given a lot of very complicated and weird solutions, so I want to write this article to talk about a few Design Pattern for caching updates (let's do more tricks).
Here, let's not discuss whether updating the cache and updating the data is a transaction, or the possibility of failure, let's assume that both updating the database and updating the cache can be successful (let's write the successful code logic right first).
There are four kinds of Design Pattern to update the cache: Cache aside, Read through, Write through, and Write behind caching. Let's take a look at these four Pattern one by one.
Cache Aside Pattern
This is the most commonly used pattern. The specific logic is as follows:
Invalidation: the application fetches the data from the cache first, and if it doesn't get it, it fetches the data from the database, and after success, it puts it in the cache.
Hit: the application fetches the data from the cache, fetches it and returns.
Update: save the data in the database first, and then invalidate the cache after success.
Note that our update is to update the database first, and then invalidate the cache after success. So, can this approach be done without the problem mentioned earlier in the article? We can make it up.
One is the query operation, the other is the concurrency of the update operation. First of all, there is no operation to delete the cache data, but the data in the database is updated first. At this time, the cache is still valid. Therefore, the concurrent query operation takes the data that is not updated, but the update operation immediately makes the cache invalid, and the subsequent query operation pulls the data out of the database. Unlike the logic at the beginning of the article, subsequent query operations are always fetching old data.
This is the standard design pattern, including Facebook's paper "Scaling Memcache at Facebook" also uses this strategy. Why not update the cache after writing the database? You can take a look at the question and answer "Why does Facebook use delete to remove the key-value pair in Memcached instead of updating the Memcached during write request to the backend?" on Quora, mainly for fear that two concurrent writes will lead to dirty data.
So, is there no concurrency problem with Cache Aside? No, for example, one is a read operation, but does not hit the cache, and then goes to the database to get the data, and then there is a write operation, after writing the database, let the cache invalidate, and then, the previous read operation will put the old data in, so it will cause dirty data.
However, this case will occur in theory, but in practice the probability may be very low, because this condition requires cache invalidation during read cache and concurrent write operation. In fact, the write operation of the database is much slower than the read operation, and the table is locked, and the read operation must enter the database operation before the write operation, and update the cache later than the write operation. The probability of all these conditions is basically small.
So, this is the answer on Quora, either through 2PC or Paxos protocol to ensure consistency, or desperately to reduce the probability of dirty data in concurrency, while Facebook uses this probability reduction method because 2PC is too slow and Paxos is too complex. Of course, it's best to set the expiration time for the cache.
Read/Write Through Pattern
We can see that in the Cache Aside routine above, our application code needs to maintain two data stores, one is Cache and the other is Repository. Therefore, the application is more verbose. The Read/Write Through routine is to proxy the operation of updating the database (Repository) by the cache itself, so it is much easier for the application layer. It can be understood that the application thinks that the back end is a single storage, while the storage maintains its own Cache.
Read Through
The Read Through routine is to update the cache in the query operation, that is, when the cache expires (expired or LRU swapped out), the Cache Aside is responsible for loading the data into the cache, while the Read Through is loaded with the cache service itself, so it is transparent to the application.
Write Through
The Write Through routine is similar to Read Through, except that it occurs when the data is updated. When there is a data update, if the cache is not hit, update the database directly, and then return. If the cache is hit, the cache is updated, and then Cache updates the database itself (this is a synchronous operation)
The following figure is from the Cache entry of Wikipedia. The Memory you can understand is the database in our example.
Write Behind Caching Pattern
Write Behind is also known as Write Back. Some students who know the kernel of the Linux operating system should be very familiar with write back. Isn't this the Page Cache algorithm of the Linux file system? Yes, you see, the basics are all connected. Therefore, the foundation is very important. I have not said once that the foundation is very important.
Write Back routine, that is, when updating data, only the cache is updated, not the database, while our cache updates the database in batches asynchronously. The advantage of this design is that it makes the data operation very fast (because it manipulates memory directly), because asynchronous, write backg can also merge multiple operations on the same data, so the performance improvement is considerable.
The problem, however, is that data is not highly consistent and can be lost (we know that an abnormal shutdown of Unix/Linux can lead to data loss). In software design, it is basically impossible for us to make a defect-free design, just like time for space and space for time in algorithm design. Sometimes, there is a conflict between strong consistency and high performance, high availability and high performance. Software design is always a trade-off between Trade-Off.
In addition, the Write Back implementation logic is more complex, because it needs to track which data is updated and needs to be brushed to the persistence layer. The write back of the operating system will only be persisted when the cache needs to fail, for example, there is not enough memory, or the process exits, which is also called lazy write.
There is a flow chart of write back on wikipedia. The basic logic is as follows:
A little more nagging
1) the Design Pattern mentioned above is not the update strategy of mysql database and memcache/redis in the software architecture, but the design of computer architecture, such as the cache of CPU, the cache in the hard disk file system, the cache on the hard disk, the cache in the database. Basically, these cached update design patterns are very old-fashioned and time-tested strategies, so this is the so-called Best Practice in engineering, just follow.
2) sometimes, we think that people who can do macro system architecture must be very experienced. In fact, many designs in macro system architecture come from these micro things. For example, isn't the principle of many virtualization technologies in cloud computing very similar to traditional virtual memory? The IWeiO model under Unix is also enlarged to the synchronous and asynchronous model in the architecture, and the pipeline invented by Unix is the data streaming computing architecture. Many of TCP's designs are also used in communication between different systems. If you take a closer look at these micro levels, you will find that many of the designs are very subtle. So, allow me to put a clear point here-- if you want to do a good job in architecture, you have to get through the computer architecture and a lot of old-fashioned basic technology.
3) in software development or design, I highly recommend that before referring to the existing designs and ideas, look at the corresponding guideline,best practice or design pattern, eat through these things, and then decide whether to reinvent the wheel. Don't take software design for granted.
4) above, we did not consider the overall transaction of Cache and Repository. For example, what if the update of Cache succeeds and the update of database fails? Or vice versa. In this regard, if you need strong consistency, you need to use the "two-phase commit protocol"-prepare, commit/rollback, such as XAResource for Java 7, and XA Transaction for MySQL 5.7.Some cache also support XA, such as EhCache. Of course, strong consistency like XA can lead to performance degradation. For topics related to distributed transactions, you can take a look at the article "transaction processing in distributed Systems".
At this point, the study on "what is the principle of Java cache update" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.