In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly shows you the "Nginx+Redis+Ehcache large-scale high concurrency and high availability of three-tier cache architecture analysis", the content is easy to understand, well-organized, hope to help you solve your doubts, the following let the editor lead you to study and learn "Nginx+Redis+Ehcache large-scale high concurrency and high availability of three-tier cache architecture analysis" this article.
Nginx
For middleware nginx, it is often used to distribute traffic, and nginx itself also has its own cache (limited capacity). We can use it to cache hot spot data, let users' requests go directly to the cache and return, and reduce the traffic to the server.
Template engine
Usually we can use template engines such as freemaker/velocity to resist a large number of requests.
A small system may render all the pages directly on the server and put them in the cache, and then the same page request can be returned directly without having to query the data source or do the data logic processing.
For systems with a large number of pages, when the template changes, the above method needs to re-render all the page templates, which is undoubtedly not desirable. Therefore, with nginx+lua (OpenResty), the template is saved separately in the nginx cache, and the data used for rendering is also stored in the nginx cache, but you need to set a cache expiration time to ensure the real-time performance of the template as much as possible.
Second, double-layer nginx to improve cache hit ratio
For the deployment of multiple nginx, if you do not add some data routing strategy, it may result in a very low cache hit rate for each nginx. So double-layer nginx can be deployed.
The distribution layer nginx is responsible for the logic and policy of traffic distribution, according to some rules defined by itself, such as hash according to productId, and then model the number of back-end nginx to route the access request of a commodity to a nginx back-end server.
The backend nginx is used to cache some hot data to its own cache (can only one be configured in the distribution layer)
Redis
At the request of the user, if the corresponding data is not cached in nginx, it will be entered into the redis cache. Redis can cache all the data, and the ability of concurrency and high availability can be improved through horizontal scaling.
I. persistence mechanism
Persistence mechanism: persist the data in redis memory to disk, and then upload disk files to some cloud storage services such as S3 (AWS) or ODPS (Ali Cloud) on a regular basis.
If both RDB and AOF persistence mechanisms are used, AOF will be used to rebuild data when redis is restarted. Because the data in AOF is more complete, it is recommended to enable both persistence mechanisms. Use AOF to ensure that data is not lost as the first choice for data recovery. Use RDB as a cold backup to recover data quickly when AOF files are lost or corrupted.
Actual combat stampede: if you want to recover data from RDB, while the AOF switch is also on, it cannot be recovered normally, because you will get data from AOF first every time (if you disable AOF temporarily, you can recover normally). At this point, first stop redis, then close AOF, copy RDB to the appropriate directory, hot modify the configuration parameter redis config set appendonly yes after starting redis, then automatically generate an AOF file of the current memory data, then stop redis again, open the AOF configuration, and start the data again.
RDB
Periodic persistence is performed on the data in redis, and each moment of persistence is a snapshot of the total data. It has little effect on the performance of redis, and it can recover abnormally quickly based on RDB.
AOF
Write to a log file in append-only mode, and when redis restarts, you can reconstruct the entire dataset by playing back the write instructions in the AOF log. (in fact, the log data written each time goes to linux os cache first, and then redis calls the operating system fsync every other second to write the data in os cache to disk.) It has a certain impact on the performance of redis, and can ensure the integrity of the data as far as possible. Redis uses the rewrite mechanism to ensure that the AOF file will not be too large, based on the current memory data and can do appropriate instruction reconstruction.
II. Redis cluster
Replication
One master and multi-slave architecture, the master node is responsible for writing, and the data is synchronized to other salve nodes (asynchronous execution), and the slave node is responsible for reading, which is mainly used for the horizontal expansion architecture of read-write separation. The master node data of this architecture must be persisted, otherwise, when the master goes down and restarts, the memory data will be emptied, then the empty data will be copied to slave, causing all data to disappear.
Sentinal Sentinel
Sentinel is a very important component in the redis cluster architecture, which is responsible for monitoring whether the redis master and slave processes are working properly. When a redis instance fails, it can send an alarm notification to the administrator. When the master node outage can be automatically transferred to the slave node, if the failover occurs, the client client will be notified of the new master address. Sentinal needs at least three instances to ensure its robustness and is better able to conduct quorum voting to achieve majority to perform failover.
The biggest feature of the first two architectures is that the data of each node is the same, so it is impossible to access massive data. Therefore, the use of the Sentinel cluster and the small amount of data
Redis cluster
Redis cluster supports multiple master node. Each master node can mount multiple slave node. If the mastre is hung up, the corresponding slave will be automatically switched to master. It should be noted that under the redis cluster architecture, slave nodes are mainly used for high availability and failure master / slave switching. If slave is required to provide read capability, you can also modify the configuration (you also need to modify the jedis source code to support read-write separation in this case). Under the redis cluster architecture, master can be extended arbitrarily, and direct scale-out master can improve the read and write throughput. Slave nodes can be migrated automatically (let master nodes have as many slave nodes as possible), and slave which is overloaded and redundant to the whole architecture can ensure higher availability of the system.
Ehcache
Tomcat jvm heap memory cache, mainly anti-redis large-scale disaster. If there is a large-scale downtime in redis, resulting in a large amount of nginx traffic pouring directly into the data production service, then the final tomcat heap memory cache can also handle some requests to prevent all requests from going directly to DB.
I specially sorted out the above technology. There are many technologies that can not be explained clearly by a few words, so I simply asked my friends to record some videos. The answers to many questions are actually very simple, but the thinking and logic behind them are not simple. To know it, you need to know why. If you want to learn Java engineering, high-performance and distributed, profound and simple. Micro services, Spring,MyBatis,Netty source code analysis friends can add my Java advanced group: 694549689, the group has Ali Daniel live explanation technology, as well as Java large-scale Internet technology video to share with you for free.
Cache data update strategy
For the cached data that requires high timeliness, when changes occur, the scheme of double write of database and redis cache is directly adopted to maximize the timeliness of the cache.
For data with low timeliness, when changes occur, we adopt the method of MQ asynchronous notification to monitor MQ messages through the data production service, and then asynchronously pull the data of the service to update tomcat jvm cache and redis cache. For nginx local cache expiration, you can pull new data from redis and update to nginx local.
The classic cache + database read-write mode, cache aside pattern
When reading, read the cache first. if there is no cache, read the database, then take out the data and put it into the cache, and return the response.
When updating, delete the cache before updating the database
The reason why it is updated is only to delete the cache, because for some complex and logical cached data, updating the cache each time the data changes will cause an additional burden, just delete the cache. let the data be re-cached by performing a read operation the next time it is used. The lazy loading strategy is adopted here. For example, if the field of a table involved in a cache is modified 20 times, or 100 times in a minute, then the cache is updated 20 times, 100 times; but the cache is read once in a minute, so each time the cache is updated, there will be a large amount of cold data, and 20% of the data for the cache complies with the 28 golden rule, accounting for 80% of visits.
The problem of inconsistent double writes between database and redis cache
The most rudimentary cache inconsistencies and solutions
Problem: if you modify the database before deleting the cache, when the cache deletion fails, it will result in the latest data in the database and the old data in the cache, resulting in data inconsistency.
Solution: you can delete the cache first, and then modify the database. If the cache is deleted successfully but the database modification fails, then there is old data in the database and there will be no inconsistency if the cache is empty.
Analysis of complex data inconsistencies
Problem: for the data to change, first delete the cache, and then modify the database. At this time, the data in the database has not been modified successfully, and the concurrent read request arrives to read the cache to find that it is empty. Then go to the database to query the old data at this time and put it in the cache, and then the previous successful modification of the database data will cause data inconsistency.
Solution: serialize database and cache update and read operations asynchronously. When updating the data, according to the unique identification of the data, the update data operation is routed to a queue within the jvm, and a queue corresponds to a worker thread, and the operations that the thread gets into the queue are carried out one by one. When you perform the update data operation in the queue, delete the cache, and then update the database, there is a read request before the update is completed. If you read the empty cache, you can first send the cache update request to the queue after the route. At this time, there will be a backlog in the queue, and then synchronously wait for the cache update to be completed. It is meaningless to string multiple identical data cache update requests in a queue. So it can be filtered. Wait for the previous update data operation to complete the database operation before performing the next cache update operation. At this time, the latest data is read from the database and then written to the cache. If the request is still within the waiting time range, continuous polling finds that it can be taken from the cache and can be returned directly (at this time, multiple requests for this cached data may be processed in this way) If the request wait event exceeds a certain amount of time, the request this time reads the old value in the database directly.
There are some issues that need to be noted for this approach:
Read requests are blocked for a long time: because read requests are slightly asynchronized, special attention should be paid to the problem of timeout. When the timeout is exceeded, DB will be queried directly, and poor handling will cause pressure on DB. Therefore, it is necessary to test the peak QPS of the system to adjust the number of machines and the number of queues on the corresponding machines to finally determine the reasonable request waiting timeout.
Request routing for multi-instance deployment: maybe this service will deploy multiple instances, so you must ensure that the corresponding requests are routed to the same service instance through the nginx server.
The tilt of the routing instructor's request for hot data: because the cache will be cleared only when the commodity data is updated, and then it will lead to read and write concurrency, so if the update frequency is not too high, the impact of this problem is not particularly great. But it is possible that some machines will have a higher load.
Concurrent conflict resolution scheme for distributed cache reconstruction
For the cache production service, it may be deployed on multiple machines. When the cache data corresponding to redis and ehcache expires and does not exist, the request from nginx and the request monitored by kafka may arrive at the same time. As a result, both of them will eventually pull the data and store it in redis. Therefore, the problem of concurrency conflicts may occur, which can be solved by using distributed locks similar to redis or zookeeper. Let the requested passive cache rebuild and monitor the active cache rebuild operation to avoid concurrency conflicts. When storing the cache, discard the old data by comparing the time field, and save the latest data to the cache.
Solution of cache cold start and cache warm-up
When the system starts for the first time, a large number of requests pour in, and the cache is empty, which may cause DB to crash and make the system unavailable. Similarly, when all redis cache data is lost abnormally, it will also cause this problem. Therefore, you can put data into redis in advance to avoid the problem of cold start. Of course, it is not possible to have full data. Hot data with high access frequency can be counted in real time according to specific access conditions similar to that of the day. There is also a lot of hot data here, which requires parallel distributed reading and writing of multiple services to redis (therefore, distributed locks based on zk)
The access traffic is reported to kafka through nginx+lua, and storm consumes data from kafka. The number of visits to each commodity in real-time statistics is based on the storage scheme of LRU (apache commons collections LRUMap) memory data structure. LRUMap is used to store because of high performance in memory and no external dependence. When each storm task is started, its own id is written to the same zk node based on zk distributed lock. Each storm task is responsible for completing the statistics of the hot data here, traversing the map at regular intervals, then maintaining a top 1000 data list, then updating the list, and finally starting a background thread to synchronize the top 1000 hot data list to the zk at intervals of time, such as a minute, and store it in a znode corresponding to this storm task.
To deploy the service of multiple instances, each time you start, you will get the node data of the above maintained storm taskid list, and then according to taskid, try to obtain the zk distributed lock of the znode corresponding to taskid. If you can acquire the distributed lock, then get the lock of taskid status and query the preheating status. If it has not been prewarmed, then the hot data list corresponding to this taskid will be taken out. If the taskid distributed lock acquisition fails, you can quickly throw the error to the next loop to obtain the next taskid distributed lock. At this time, multiple service instances do coordinated parallel cache preheating based on zk distributed lock.
Cache hotspots make the system unavailable solution
The influx of requests for a large amount of the same data in an instant may cause the corresponding application layer nginx of the data to be crushed after the hash policy, and if the request continues, it will affect other nginx, resulting in all nginx exceptions and the whole system becomes unavailable.
The traffic distribution policy of nginx+lua+storm-based hotspot cache is automatically degraded to solve the above problems. You can set data with access times greater than 95% of the average value n times as hot spots, and send http requests directly to the nginx of traffic distribution in storm to store them in the local cache. Then storm will not send the complete cached data corresponding to hotspots to all application nginx servers and store them directly in the local cache.
For the traffic distribution nginx, access the corresponding data, and immediately downgrade the traffic distribution policy if it is found to be a hot spot identification, and the access to the same data is downgraded from hash to an application layer nginx to be distributed to all application layer nginx. Storm needs to save the hotspot List identified last time, and compare it with the currently calculated hotspot list. If it is no longer hot data, send the corresponding http request to the traffic distribution nginx to cancel the hotspot identification of the corresponding data.
Cache avalanche solution
The redis cluster crashes completely, and the cache service waits for a large number of requests for redis, which takes up resources. Then, a large number of requests from the cache service enter the source service to query the DB, which causes the DB pressure to collapse. At this time, the requests for the source service are also waiting to occupy resources, and a large number of resources of the cache service are spent on accessing the redis and the source service without success. Finally, it is unable to provide the service itself, which will eventually lead to the collapse of the entire website.
The solution beforehand is to build a redis cluster cluster with highly available architecture, with one master and multiple slaves. Once the master node goes down, the slave node automatically keeps up, and it is best to deploy the cluster with two computer rooms.
The solution in the matter is to deploy a layer of ehcache cache, which can withstand some of the pressure when the redis is fully implemented; isolate the access to the redis cluster to avoid waiting for all resources, deploy the corresponding circuit breaker policy and deploy the degradation policy of the redis cluster when the access to the redis cluster fails; limit the flow of access to the source service and isolate the resources
The solution after the event: the redis data can be restored directly after it has been backed up, and then restart redis; if the redis data fails completely or the data is too old, you can quickly preheat the cache, and then let redis restart. Then, because the half-open policy of resource isolation finds that redis has been accessed normally, then all requests will be automatically resumed.
Cache traversal solution
For there is no corresponding data in the multi-level cache, and DB does not query the data, a large number of requests will directly reach DB, resulting in the problem of high concurrency of DB load. To solve the problem of cache penetration, you can return an empty identity to the data that is not available in DB, and then save it to all levels of cache. Because there is asynchronous listening for data modification, when the data is updated, the new data will be updated to the cache sink.
Invalidation of Nginx cache results in redis pressure doubling
You can set the validity period of the random cache when caching data locally in the nginx to avoid the cache invalidation at the same time and a large number of requests directly entering the redis.
This is all the content of the article "Analysis of large-scale high concurrency and high availability three-tier cache architecture in Nginx+Redis+Ehcache". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.