In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
It's all practical information! This paper mainly discusses the following issues:
(1) when will the data in the database and cache be inconsistent
(2) inconsistent optimization idea
(3) how to ensure the consistency between database and cache
I. the origin of demand
When the data changes, the point of "eliminate the cache first and then modify the database" is the most discussed.
This conclusion is based on the fact that since manipulating the cache and manipulating the database are not atomic, execution failures are very likely.
Suppose you write to the database first, and then eliminate the cache: if the first step is successful, and the second step fails to eliminate the cache, there will be new data in DB, old data in Cache, and inconsistent data (as shown above: new data in db, old data in cache).
Suppose you first eliminate the cache and then write to the database: if the first step succeeds in eliminating the cache and the second step fails to write to the database, only one Cache miss will be triggered (as shown in the figure above: there is no data in cache, but there is old data in db).
Conclusion: eliminate the cache before writing to the database.
The point discussed here is "manipulate the cache first, and if a read request occurs before writing to the database, it may cause the old data to be cached and lead to data inconsistency", which is the topic of this article.
Second, why the data are inconsistent
Think back to the process of reading and writing to the cache and database.
Write process:
(1) eliminate cache first
(2) write db again
Reading process:
(1) read cache first, and return if the data hits hit.
(2) read db if the data does not hit miss
(3) caching the data read in db
Under what circumstances may there be data inconsistency between the cache and the database?
In a distributed environment, data are read and written concurrently, and there are multiple applications upstream. Read and write the same data through multiple deployments of a service (to ensure availability, multiple copies must be deployed). Concurrent read and write at the database level does not guarantee the completion order, that is, subsequent read requests are likely to be completed first (read dirty data):
(a) cache was eliminated in the first step of a write request (such as 1 in the figure above).
(B) the second step of An is to write to the database and issue a modification request (such as 2 in the figure above)
(C) the first step of the read request BPerry B to read the cache occurs and the cache is found to be empty (such as step 3 in the figure above).
(d) the second step of B reads the database and sends a read request. At this time, before the second step of A writes the data, a dirty data is read out and put into cache (such as step 4 in the figure above).
That is, at the database level, the latter request 4 is completed earlier than the first request 2, the dirty data is read out, and the dirty data is cached again, and the cache is inconsistent with the data in the database.
Third, inconsistent optimization ideas
Is it possible to make sure that the first request must be executed first? The common idea is "serialization". Today we will discuss the point of "serialization" with you.
First, let's take a closer look at how multiple concurrent read and write SQL are executed in a service.
The figure above shows the upstream and downstream of a service service and the detailed deployment within the service. The details are as follows:
(1) upstream of service are multiple business applications. Upstream initiates requests for concurrent read and write operations on the same data. In the above example, a uid=1 balance modification (write) operation and a uid=1 balance query (read) operation are performed concurrently.
(2) downstream of service is the database DB. It is assumed that only one DB is read and written.
(3) in the middle is the service layer service, which is divided into several parts.
(3.1) the top layer is the task queue.
(3.2) in the middle is the worker thread, each worker thread completes the actual work task, and the typical work task is to read and write the database through the database connection pool.
The lowest layer is the database connection pool, and all SQL statements are sent to the database for execution through the database connection pool.
The typical workflow of a worker thread is as follows:
Void work_thread_routine () {
Task t = TaskQueue.pop (); / / get the task
/ / logical processing of tasks to generate sql statements
DBConnection c = CPool.GetDBConnection (); / / get a DB connection from the DB connection pool
C.execSQL (sql); / / execute sql statements over a DB connection
CPool.PutDBConnection (c); / / put the DB connection back into the DB connection pool
}
Question: the task queue has actually done the task serialization, can you ensure that the task will not be executed concurrently?
A: no, because
(1) A service has multiple worker threads, and tasks that pop up serially will be executed in parallel.
(2) A service has multiple database connections. Each worker thread acquires different database connections and executes concurrently at the DB level.
Question: assuming that only one service is deployed, can you ensure that the tasks will not be executed concurrently?
A: no, the reason is the same as above
Question: assuming that there is only one database connection for a service, can you ensure that the task will not be executed concurrently?
A: no, because
(1) A service has only one database connection, which can only guarantee that requests on one server are executed serially at the database level.
(2) because services are distributed, requests on multiple services may still be executed concurrently at the database level.
Question: assuming that only one service is deployed and there is only one connection for one service, can you ensure that the tasks will not be executed concurrently?
A: yes, globally, the request is executed serially, the throughput is very low, and the service cannot guarantee availability.
It's over. It seems hopeless.
1) Task queue cannot guarantee serialization
2) serialization is not guaranteed for single-service multi-database connections.
3) the serialization of multi-service single database connection is not guaranteed.
4) single service single database connection may guarantee serialization, but the throughput level is low, and the availability of the service can not be guaranteed, which is almost not feasible, so is there a solution?
To take a step back, there is no need to serialize global requests, but only to "serialize access to the same data".
In a service, how to "serialize access to the same data" only needs to "allow access to the same data to be performed through the same DB connection".
How to "allow the access of the same data to be performed through the same DB connection" only needs to be "slightly modified at the DB connection pool level, and the connection can be obtained by data"
Get the CPool.GetDBConnection () of the DB connection [return any available DB connection] to
CPool.GetDBConnection (longid) [returns the DB connection associated with id fetch]
The advantages of this amendment are:
(1) simple, you only need to modify the DB connection pool implementation and the DB connection acquisition.
(2) the modification of connection pool does not need to pay attention to business. What is the meaning of the passed id? connection pool is not concerned. You can simply return the DB connection according to id mode.
(3) it can be applied to a variety of business scenarios. The user data service is passed into user-id for connection, and the order data service is passed into order-id for connection.
In this way, we can ensure that the execution of the same data, such as uid, at the database level must be serial.
Wait a minute, many copies of the service are deployed. The above solution can only guarantee the access of the same data on one service. The execution at the DB level is serialized. In fact, the service is distributed and the global access is still parallel. How to solve this problem? Is it possible that the access to the same data must fall to the same service?
Fourth, can access the same data fall on the same service?
Above, we analyze the upstream and downstream and internal structure of the service layer service, and then take a look at the upstream and downstream and internal structure of the application layer.
The figure above shows the upstream and downstream of a business application and the internal deployment of the service. The details are as follows:
(1) the upstream uncertainty of a business application may be a direct http request or an upstream call to a service.
(2) downstream of business applications are multiple service service
(3) the middle is the business application, which is divided into several parts.
(3.1) at the top is the task queue [maybe web-server, such as tomcat, did it for you]
(3.2) in the middle is the worker thread (maybe the web-server worker thread or the cgi worker thread does this for you). Each worker thread completes the actual business task, and the typical work task is to make RPC calls through the service connection pool.
The lowest layer is the service connection pool, and all RPC calls are sent to downstream services through the service connection pool.
The typical workflow of a worker thread is as follows:
Voidwork_thread_routine () {
Task t = TaskQueue.pop (); / / get the task
/ / Task logic processing, which forms a network packet packet, and calls the downstream RPC API
ServiceConnection c = CPool.GetServiceConnection (); / / get a Service connection from the Service connection pool
C.Send (packet); / / send a message over the Service connection to execute the RPC request
CPool.PutServiceConnection (c); / / put the Service connection back into the Service connection pool
}
Deja vu? Yes, just make a few changes to the service connection pool:
Get the CPool.GetServiceConnection () of the Service connection [return any available Service connection] to
CPool.GetServiceConnection (longid) [returns the Service connection associated with id fetch]
In this way, requests for the same data, such as uid, can be guaranteed to fall on the same service Service.
V. Summary
Due to the concurrency of read and write at the database level, the problem of inconsistency between the database and the cached data (in essence, the later read request is returned first) may be solved by two small changes:
(1) modify the service Service connection pool, and select the service connection by using id to ensure that the reading and writing of the same data fall on the same back-end service.
(2) modify the database DB connection pool and select the DB connection in id mode, which can ensure that the read and write of the same data is serial at the database level.
VI. Remaining problems
Question: will modeling access to the service affect the availability of the service?
A: no, when a downstream service dies, the service connection pool can detect the availability of the connection and exclude the unavailable service connection when taking the model.
Question: will modeling access to service and DB affect the requested load balancer on each connection?
A: no, as long as the data access id is balanced, the probability of obtaining each connection by id is equal, that is, the load is balanced.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.