In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
In this article, the editor introduces in detail "how to solve Redis-related problems". The content is detailed, the steps are clear, and the details are handled properly. I hope this article "how to solve Redis-related problems" can help you solve your doubts.
Redis persistence mechanism
Redis is an in-memory database that supports persistence. Data in memory is synchronized to hard disk files through a persistence mechanism to ensure data persistence. When the Redis is rebooted, the data can be recovered by reloading the hard disk files into memory.
Implementation: create a fork () child process separately, copy the database data of the current parent process into the memory of the child process, and then write the child process to a temporary file, the persistence process is over, and then replace the last snapshot file with this temporary file, and then the child process exits and memory is freed.
RDB is the default persistence method for Redis. According to a certain time period strategy, the data in memory is saved to the binary file of the hard disk in the form of a snapshot. That is, Snapshot snapshot storage, and the corresponding data file is dump.rdb. The period of the snapshot is defined by the save parameter in the configuration file. A snapshot can be either a copy of the data it represents or a copy of the data. )
AOF:Redis appends every write command received to the end of the file through the Write function, similar to MySQL's binlog. When Redis restarts, it rebuilds the contents of the entire database in memory by re-executing the write command saved in the file.
When both methods are enabled at the same time, Redis for data recovery will give priority to AOF recovery.
Problems such as cache avalanche, cache penetration, cache warm-up, cache update, cache degradation, etc.
We can simply understand the cache avalanche as: due to the invalidation of the original cache, the new cache has not arrived.
(for example, we use the same expiration time when we set the cache, and a large area of cache expires at the same time.) all the requests that should have access to the cache query the database, which puts great pressure on the database CPU and memory, causing serious database downtime. Thus forming a series of chain reactions, resulting in the collapse of the whole system.
Solution:
Most system designers consider locking (the most solutions) or queues to ensure that there will not be a large number of threads reading and writing to the database at one time, so as to avoid a large number of concurrent requests falling on the underlying storage system in case of failure. There is also a simple solution to spread the cache expiration time.
Second, cache penetration
Cache traversal refers to the user querying data, which is not available in the database and naturally does not exist in the cache. As a result, when the user makes a query, he can't find it in the cache, so he has to go to the database and query it again each time, and then return null (equivalent to two useless queries). In this way, the request bypasses the cache and looks up the database directly, which is also a frequently asked issue of cache hit rate.
Solution.
The most common is to use a Bloom filter to hash all possible data into a large enough bitmap, and a certain non-existent data will be intercepted by this bitmap, thus avoiding the query pressure on the underlying storage system.
There is also a simpler and more crude way, if the data returned by a query is empty (whether the data does not exist or a system failure), we still cache the empty result, but its expiration time will be very short. The maximum is no more than five minutes. The default value set directly through this is stored in the cache so that the value is fetched the second time in the buffer without continuing to access the database, which is the simplest and roughest way.
5TB's hard drive is full of data, please write an algorithm to weight the data. What if these data are 32bit-sized data? What if it's 64bit's?
The use of space has reached an extreme, that is, Bitmap and Bloom Filter.
Bitmap: a typical hash table
The disadvantage is that Bitmap can only record 1bit information for each element. If you want to complete additional functions, you may have to sacrifice more space and time.
Bloom filter (recommended)
It introduces k (k > 1) k (k > 1) independent hash functions to guarantee the completion of the process of judging the weight of elements under a given space and misjudgment rate.
Its advantage is that the space efficiency and query time are far higher than the general algorithm, and the disadvantage is that it has a certain error recognition rate and deletion difficulties.
The core idea of Bloom-Filter algorithm is to use several different Hash functions to resolve "conflicts".
There is a conflict (collision) problem in Hash, and the values of two URL obtained from the same Hash may be the same. To reduce conflicts, we can introduce a few more Hash, and if we conclude from one of the Hash values that an element is not in the collection, then the element is definitely not in the collection. Only when all the Hash functions tell us that the element is in the collection can we be sure that the element exists in the collection. This is the basic idea of Bloom-Filter.
Bloom-Filter is generally used to determine the existence of an element in a collection of large amounts of data.
Added by reminder: the difference between cache penetration and cache breakdown
Cache breakdown: a key is very hot, and large concurrency focuses on accessing the key. When the key expires, the persistent large concurrency access breaks through the cache and requests the database directly.
Solution: before accessing the key, use SETNX (set if not exists) to set another short-term key to lock the access to the current key, and then delete the short-term key after the access.
Third, cache warm-up
Cache preheating should be a common concept. I believe many partners can easily understand it. Cache preheating means loading relevant cache data directly into the cache system after the system is online. In this way, you can avoid the problem of querying the database and then caching the data when the user requests it. Users directly query pre-warmed cache data!
The solution is as follows:
1. Write a cache to refresh the page directly, and do it manually when you launch.
2. The amount of data is small and can be loaded automatically when the project starts.
3. Refresh the cache regularly
IV. Cache updates
In addition to the cache invalidation policy provided by the cache server (Redis has 6 policies to choose from by default), we can also customize cache elimination according to specific business needs. There are two common strategies:
(1) regularly clean up expired caches
(2) when a user requests, determine whether the cache used in the request has expired. If it expires, go to the underlying system to get new data and update the cache.
Both have their own advantages and disadvantages. The first disadvantage is that it is troublesome to maintain a large number of cached key, and the second disadvantage is to judge the cache invalidation every time the user requests it. The logic is relatively complex! You can weigh which scheme to use according to your own application scenario.
Fifth, cache degradation
When there is a sharp increase in traffic, when there is a problem with the service (such as slow response time or non-response), or when non-core services affect the performance of the core process, it is still necessary to ensure that the service is still available, even if it is damaging. The system can be degraded automatically according to some key data, or the switch can be configured to achieve manual degradation.
The ultimate goal of the downgrade is to ensure that core services are available, even if they are damaging. And some services cannot be downgraded (such as adding shopping carts, clearing).
Set the profile with reference to the log level:
(1) General: for example, some services can be degraded automatically if they time out occasionally because of network jitter or when the service is online.
(2) warning: if the success rate of some services fluctuates over a period of time (for example, between 95% and 100%), they can be downgraded automatically or manually, and send an alarm.
(3) error: for example, if the availability rate is less than 90%, or the database connection pool is burst, or the number of visits suddenly soars to the maximum threshold that the system can bear, it can be downgraded automatically or manually according to the situation.
(4) serious errors: for example, if the data is wrong for special reasons, an emergency manual downgrade is required.
The purpose of service degradation is to prevent Redis service failures, resulting in avalanche problems in the database. Therefore, a service degradation strategy can be adopted for unimportant cached data. For example, a more common practice is to return the default value directly to the user instead of querying the database when there is a problem with Redis.
What are hot data and cold data?
Caching is valuable for hot data.
For cold data, most of the data may be squeezed out of memory before it is accessed again, which not only takes up memory, but also is of little value. Frequently modified data, depending on the situation, consider using caching
For the above two examples, there is a feature of birthday list and navigation information, that is, the frequency of information modification is not high, and the reading is usually very high.
For hot data, such as one of our IM products, birthday greeting module, birthday list of the day, cache may read hundreds of thousands of times later. For another example, for a navigation product, we will navigate the information and cache it for millions of times later.
* * caching is meaningful only if the data is read at least twice before it is updated. This is the most basic strategy, and if the cache fails before it works, it won't be of much value.
Does it exist that the modification frequency is high, but the cached scenarios have to be considered? Yes! For example, this read interface puts a lot of pressure on the database, but it is also hot data. At this time, we need to consider caching means to reduce the pressure on the database. For example, the number of likes, favorites and shares of one of our assistant products is a very typical hot data, but it is constantly changing. At this time, we need to synchronize the data to the Redis cache to reduce the pressure on the database.
What are the differences between Memcache and Redis?
1) Storage method Memecache stores all the data in memory, and will hang up when the power is off, and the data cannot exceed the memory size. Part of Redis is stored on hard disk, and redis can persist its data.
2) data support type memcached all values are simple strings. Redis, as its substitute, supports richer data types and provides storage of data structures such as list,set,zset,hash.
3) using the underlying model, the underlying implementation between them and the application protocol for communicating with the client are different. Redis directly builds its own VM mechanism, because the general system calls system functions, it will waste a certain amount of time to move and request.
4)。 Value values vary: Redis can reach a maximum of 512m Redis memcache only 1mb.
5) redis is much faster than memcached
6) Redis supports data backup, that is, data backup in master-slave mode.
Why is single-threaded redis so fast?
(1) Pure memory operation
(2) single-thread operation to avoid frequent context switching
(3) the non-blocking Iripple O multiplexing mechanism is adopted.
The data types of redis and the usage scenarios for each data type
Answer: there are five kinds.
(1) String
There is nothing to say about this. For the most conventional set/get operation, value can be String or a number. Generally do some complex counting function of the cache.
(2) hash
Here value stores structured objects, and it is convenient to manipulate one of the fields. When bloggers do single sign-on, they use this data structure to store user information, use cookieId as key, and set 30 minutes as cache expiration time, which can well simulate the effect similar to session.
(3) list
Using the data structure of List, you can do a simple message queue function. Another is that you can use the lrange command to do redis-based paging with excellent performance and a good user experience. I also use a scene, which is very appropriate-to get market information. It is also a scene of producers and consumers. LIST can well complete the queuing, first-in, first-out principle.
(4) set
Because set stacks a collection of values that are not repeated. So you can do the global de-duplication function. Why not use the Set that comes with JVM to remove the weight? Because our systems are generally deployed in clusters, it is troublesome to use the Set that comes with JVM. Is it too troublesome to do a global de-duplication and start a public service?
In addition, the use of intersection, union, subtraction and other operations, you can calculate common preferences, all preferences, their own unique preferences and other functions.
(5) sorted set
Sorted set has an extra weight parameter score, and the elements in the collection can be arranged by score. Can do ranking application, take TOP N operation.
Redis internal structure
Dict is essentially to solve the search problem in the algorithm (Searching) is a data structure used to maintain the mapping relationship between key and value, similar to Map or dictionary in many languages. Essentially, it is to solve the search problem in the algorithm (Searching)
Sds sds is equivalent to char * it can store any binary data and cannot identify the bundle of a string with the character'\ 0' like a C language string, so it must have a length field.
Skiplist (jump table) jump table is a simple, single-layer multi-pointer linked list. It has high search efficiency, which is comparable to the optimized binary balanced tree, and compared with the implementation of balanced tree.
Quicklist
Ziplist compression table ziplist is an encoded list, which is a sequential data structure composed of a series of specially encoded consecutive memory blocks.
Expiration Policy and memory obsolescence Mechanism of redis
Redis uses a periodic delete + lazy delete strategy.
Why not delete policies regularly?
Delete regularly, use a timer to monitor the key, and delete automatically when it expires. Although memory is released in time, it consumes CPU resources. In the case of large concurrent requests, CPU uses time to process requests instead of deleting key, so it does not adopt this strategy.
How does regular deletion + lazy deletion work?
Delete regularly. By default, redis checks each 100ms to see if there is an expired key, and if there is an expired key, delete it. It is important to note that redis does not check all the key once for every 100ms, but is randomly selected for inspection (if every other 100ms, all key are checked, the redis will not be stuck). Therefore, if you only use the periodic deletion policy, it will result in a lot of key not being deleted by the time.
As a result, lazy deletion comes in handy. That is to say, when you get a key, redis will check whether the key has expired if the expiration time is set. If it expires, it will be deleted.
Is there no other problem with regular deletion + lazy deletion?
No, if you delete key periodically, you don't delete it. Then you didn't immediately request key, which means that lazy deletion didn't work either. In this way, the memory of redis will be higher and higher. Then the memory elimination mechanism should be adopted.
There is a line configuration in redis.conf
Maxmemory-policy volatile-lru
This configuration is equipped with a memory elimination strategy (what, you didn't match it? Reflect on yourself)
Volatile-lru: select the least recently used data elimination from the dataset with an expiration time set (server. DB [I]. Obsolete).
Volatile-ttl: select the expired data from the dataset (server. DB [I]. Expires) that will expire.
Volatile-random: arbitrarily select data elimination from a dataset with an expiration time set (server. DB [I]. Expires).
Allkeys-lru: select the least recently used data elimination from the dataset (server. DB [I]. Dict).
Allkeys-random: data elimination from any selection of the dataset (server. DB [I]. Dict)
No-enviction (eviction): data expulsion is prohibited and new writes will report errors.
Ps: if the key of expire is not set, the prerequisites (prerequisites) are not met; then the behavior of volatile-lru, volatile-random, and volatile-ttl policies is basically the same as that of noeviction (do not delete).
Why Redis is single-threaded
According to the official FAQ, because Redis is a memory-based operation, CPU is not the bottleneck of Redis, and the bottleneck of Redis is most likely to be the size of machine memory or network bandwidth. Since single-threading is easy to implement, and CPU won't be a bottleneck, it makes sense to adopt a single-threaded solution (after all, multithreading can be troublesome! ) Redis uses queue technology to turn concurrent access into serial access
1) the vast majority of requests are purely memory operations (very fast) 2) single thread is adopted to avoid unnecessary context switching and competition conditions
3) advantages of non-blocking IO:
1. Fast, because the data is stored in memory, the advantage similar to HashMap,HashMap is that the time complexity of search and operation is O (1).
two。 Support for rich data types and string,list,set,sorted set,hash
3. Transactions are supported and operations are atomic. the so-called atomicity is that changes to data are either performed or not performed at all.
4. Rich features: can be used for caching, messages, set expiration time by key, after expiration will be automatically deleted how to solve the concurrency competition key problem of redis
At the same time, there are multiple subsystems to set a key. What should we pay attention to at this time? Redis's transaction mechanism is not recommended. Because our production environment is basically a redis cluster environment, we have done data slicing operation. When you have multiple key operations involved in a transaction, the multiple key may not be stored on the same redis-server. Therefore, the transaction mechanism of redis is very chicken.
(1) if no order is required for this key operation: prepare a distributed lock, everyone grab the lock, and then do the set operation when you get the lock.
(2) for this key operation, the order is required: distributed lock + timestamp. Assuming that system B grabs the lock first, set key1 to {valueB 3:05}. Then system A grabs the lock and finds that the timestamp of its valueA is earlier than the timestamp in the cache, so it does not do the set operation. and so on.
(3) using queues, changing the set method into serial access can also enable redis to encounter high concurrency, if the consistency of reading and writing key is guaranteed.
Redis operations are atomic, thread-safe operations, you do not have to consider the concurrency problem, redis has helped you deal with the concurrency problem.
What should the Redis cluster scheme do? What are the plans?
The general concept of 1.twemproxy is that it is similar to a proxy method, which is used to connect to the redis where it is needed to connect to the twemproxy. It receives the request as an agent and uses the consistent hash algorithm to transfer the request to a specific redis and return the result to twemproxy.
Disadvantages: due to the pressure of twemproxy's own single-port instance, after using consistent hash, the data cannot be moved to the new node automatically due to the change of the calculated value when the number of redis nodes is changed.
2.codis, the most widely used cluster scheme at present, has the same effect as twemproxy, but it supports the recovery of old node data to new hash nodes when the number of nodes changes.
3.redis cluster3.0 's own cluster is characterized by the fact that his distributed algorithm is not consistent hash, but the concept of hash slots, as well as its own support nodes to set slave nodes. Take a look at the official document introduction.
Have you tried to deploy multi-machine redis? How to ensure that the data are consistent?
Master-slave replication, read-write separation
One is the master database (master) and the other is the slave database (slave). When the write operation occurs, the master database can automatically synchronize the data to the slave database, while the slave database is generally read-only and receives the data synchronized from the master database. A master database can have multiple slave databases, while a slave database can only have one master database.
How to deal with a large number of requests
Redis is a single-threaded program, that is, it can only handle one client request at a time.
Redis processes multiple client requests through IO multiplexing (select,epoll, kqueue, which takes different implementations according to different platforms).
Redis common performance problems and solutions?
(1) Master is best not to do any persistence work, such as RDB memory snapshots and AOF log files.
(2) if the data is important, a Slave enables AOF to back up data, and the policy is set to synchronize once per second.
(3) for the speed of master-slave replication and the stability of connection, Master and Slave should be in the same local area network.
(4) try to avoid adding slave libraries to the stressed master libraries.
(5) Master-slave replication does not use graphic structure, but one-way linked list structure is more stable, that is, Master
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.