In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article will explain in detail what are the interview questions about Redis, the editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.
Cache knowledge points
What are the types of caches?
Caching is an effective means to improve the performance of hot data access in high concurrency scenarios, and it is often used when developing projects.
There are three types of caches: local cache, distributed cache and multi-level cache.
Local cache:
Local caching is cached in the memory of the process, such as in our JVM heap, which can be implemented using LRUMap or a tool such as Ehcache.
Local cache is memory access, there is no remote interaction overhead, the performance is the best, but limited by the stand-alone capacity, the cache is generally small and can not be expanded.
Distributed caching:
Distributed caching can solve this problem very well.
Distributed cache generally has a good ability to scale horizontally and can cope with scenarios with a large amount of data. The disadvantage is that remote requests are required and the performance is not as good as that of local caching.
Multi-level caching:
In order to balance this situation, the actual business generally uses multi-level cache, the local cache only stores the most frequently accessed part of the hot data, and the other hot data is placed in the distributed cache.
In the current front-line large factories, this is also the most commonly used cache scheme, a single test of a single cache scheme is often difficult to support many high concurrency scenarios.
Elimination strategy
Whether it is local cache or distributed cache, in order to ensure high performance, memory is used to save data. due to cost and memory limitations, when the stored data exceeds the cache capacity, the cached data needs to be removed.
The general elimination strategies include FIFO elimination of the earliest data, LRU elimination of the least recently used data, and LFU elimination of the least frequently used data.
Noeviction: returns an error when the memory limit is reached and the client tries to execute commands that allow more memory to be used (most write instructions, but DEL and a few exceptions)
Allkeys-lru: try to recycle the least used key (LRU) so that the newly added data has room to store.
Volatile-lru: try to recycle the least used keys (LRU), but only those keys in expired collections, so that newly added data has room to store.
Allkeys-random: reclaim random keys to make room for newly added data.
Volatile-random: reclaiming random keys gives room for newly added data, but only for keys in expired collections.
Volatile-ttl: reclaim keys in expired collections, and give priority to keys with a short time to live (TTL) so that newly added data can be stored.
If there are no keys to meet the prerequisites for recycling, the strategies volatile-lru, volatile-random, and volatile-ttl are almost the same as noeviction.
In fact, the Lru algorithm is also implemented in the familiar LinkedHashMap, as follows:
When the capacity exceeds 100, start implementing the LRU policy: evict the least recently unused TimeoutInfoHolder objects.
You will be asked to write the LUR algorithm in the real interview, but don't do the original one. There are so many TM that you can't finish it. Either against the above one or the following one, it's easy to find a data structure to implement the Java version of LRU. Just know what principle it is.
Memcache
Note that Memcache will be referred to as MC for short.
Let's first take a look at the features of MC:
MC uses multi-thread asynchronous IO when processing requests, which can make reasonable use of the advantages of CPU multi-core, and the performance is very excellent.
MC is simple and uses memory to store data.
I will not elaborate on the memory structure and calcification of MC. You can check the official website to learn about it.
MC can set the expiration period for cached data, and the expired data will be cleared.
The failure strategy adopts delayed failure, which is to check whether the failure occurs when the data is used again.
When the capacity is full, the data in the cache will be deleted. In addition to cleaning up the expired key, the data will also be deleted according to the LRU policy.
In addition, there are some restrictions on using MC, which are fatal in the current Internet scenario and become an important reason for people to choose Redis and MongoDB:
Key cannot exceed 250byte
Value cannot exceed 1m bytes
The maximum failure time of key is 30 days.
Only Kmurv structure is supported, and no persistence and master-slave synchronization functions are provided.
Redis
First of all, let's briefly talk about the characteristics of Redis, which is convenient to compare with MC.
Unlike MC, Redis processes requests in a single-threaded mode. There are two reasons for this: one is due to the use of non-blocking asynchronous event handling mechanism; the other is that caching data is memory operation IO time will not be too long, single thread can avoid the cost of thread context switching.
Redis supports persistence, so Redis can be used not only as a cache, but also as an NoSQL database.
Compared with MC,Redis, there is a very big advantage, that is, in addition to KMurV, it also supports a variety of data formats, such as list, set, sorted set, hash and so on.
Redis provides master-slave synchronization mechanism, as well as Cluster cluster deployment capabilities, which can provide highly available services.
Detailed explanation of Redis
The knowledge point structure of Redis is shown in the following figure.
Function
Let's take a look at what features Redis provides!
Let's first look at the basic types:
String:
The String type is the most commonly used type in Redis, and the internal implementation is stored through SDS (Simple Dynamic String). SDS, similar to ArrayList in Java, can reduce the frequent allocation of memory by pre-allocating redundant space.
This is the simplest type, which is plain set and get, which do simple KV caching.
But in a real development environment, many people may convert many complex structures into String for storage. For example, some people like to convert objects or List into JSONString for storage, and then reverse the sequence of words.
I will not discuss whether this is right or wrong here, but I still hope that everyone can use the most appropriate data structure in the most appropriate scenario. The object can not find the most suitable data structure, but the type can choose the most appropriate one. After that, someone else takes over your code and looks so standardized. Ah, this guy has something to see that you use String, garbage!
All right, these are beside the point, the truth still hopes that everyone will keep in mind that habits become natural, and small habits make you.
The practical application scenarios of String are as follows:
Caching function: String string is the most commonly used data type, not only Redis, but all languages are the most basic type. Therefore, using Redis as cache, cooperating with other databases as storage layer, and using Redis to support high concurrency can greatly speed up the read and write speed of the system and reduce the pressure on the back-end database.
Counter: many systems will use Redis as the real-time counter of the system, which can quickly realize the function of counting and query. And the final data results can be permanently stored in the database or other storage media according to a specific time.
Shared user Session: the user refreshes the interface and may need to access the data to log in again, or access the page to cache Cookie, but you can use Redis to centrally manage the user's Session. In this mode, you only need to ensure the high availability of Redis, and each user's Session update and acquisition can be completed quickly. The efficiency is greatly improved.
Hash:
This is a structure similar to Map, which generally allows you to cache structured data, such as an object (provided that the object does not nest other objects) in the Redis, and then manipulate a field in the Hash every time you read or write the cache.
But this scene is actually more or less single, because now many objects are more complex, for example, your commodity object may contain a lot of attributes, including objects. I don't use that much of the scene myself.
List:
List is an ordered list, this can still play a lot of tricks.
For example, you can use List to store some column phenotypic data structures, such as fan lists, comment lists of articles, and so on.
For example, you can use the lrange command to read elements in a closed interval, and you can achieve paging query based on List, which is a great function. Based on Redis, you can achieve simple high-performance paging, and you can do things like Weibo that drop down and keep paging. With high performance, you can go page by page.
For example, you can set up a simple message queue, go in from the List head and get it out of the List butt.
List itself is a common data structure in our development process, not to mention hot data.
Message queue: Redis linked list structure, you can easily implement blocking queues, you can use left in and right out commands to complete the design of the queue. For example, data producers can insert data from the left through the Lpush command, and multiple data consumers can use the BRpop command to "grab" the data at the end of the list.
The application of article list or data paging display.
For example, the list of articles in our commonly used blog sites, when there are more and more users, and each user has his own list of articles, and when there are many articles, they all need to be displayed in pages, then you can consider using Redis's list, which is not only orderly but also supports obtaining elements according to the scope, which can perfectly solve the paging query function. The query efficiency is greatly improved.
Set:
Set is an unordered set that automatically removes duplicates.
Throw the data that needs to be deduplicated in the system directly based on Set, and it will be deduplicated automatically. If you need to quickly remove the weight of some data globally, you can also do it based on the HashSet in JVM memory, but what if your system is deployed on multiple machines? Global Set deduplication must be done based on Redis.
We can play the operations of intersection, union and difference based on Set, such as intersection. We can make an intersection of the list of friends of two people and see who their common friends are. Right.
Anyway, there are a lot of these scenarios, because the comparison is fast and the operation is simple. Two queries and one Set is done.
Sorted Set:
Sorted set is a sorted Set, which can be sorted when it is deduplicated. When written in, it is given a score, and it is automatically sorted according to the score.
The usage scenario of ordered collections is similar to that of collections, but set collections are not automatically ordered, while Sorted set can use scores to sort among members, and sort them at insertion time. So when you need an ordered and non-repetitive list of collections, you can choose the Sorted set data structure as your choice.
Ranking: collect classic usage scenarios in an orderly manner. For example, video websites need to rank the videos uploaded by users, and the maintenance of the list may be in many aspects: according to time, according to the number of broadcasts, according to the number of likes, and so on.
Use Sorted Sets to do weighted queues, such as the score of ordinary messages is 1, the score of important messages is 2, and the worker thread can choose to get work tasks in reverse order of score. Give priority to important tasks.
In the hot search list of Weibo, there is a heat value at the back, and the name is in front of it.
Advanced usage:
Bitmap:
Bitmap supports storing information according to bit bits, and can be used to implement Bloom filter (BloomFilter).
HyperLogLog:
For imprecise deduplication function, it is more suitable for large-scale data deduplication statistics, such as statistical UV
Geospatial:
It can be used to save the geographical location and calculate the location distance or calculate the location according to the radius. Have you ever thought of using Redis to implement people around you? Or calculate the optimal map path?
In fact, these three can also be counted as a data structure, I do not know how many friends remember, I mentioned in the Redis Foundation where the dream began, if you only know five basic types, you can only get 60 points, if you can speak advanced usage, then you think you have something.
Pub/sub:
The function is the subscription publication function, which can be used as a simple message queue.
Pipeline:
You can execute a set of instructions in batch and return all the results at once, which can reduce the frequent request responses.
Lua:
Redis supports the submission of Lua scripts to perform a range of functions.
When I was a former employer of e-commerce, I often used this thing in the second kill scene, which was a little fragrant to reason and took advantage of his atomicity.
So, do you want to see the design of the second kill? I remember I seem to ask questions every time in the interview. If you want to see it, please like it directly and then comment on the second kill.
Transaction:
The last function is a transaction, but Redis does not provide strict transactions. Redis only guarantees serial execution of commands and guarantees full execution, but does not roll back when execution fails, but continues to execute.
Persistence
Redis provides two persistence methods: RDB and AOF. RDB writes the data set in memory to disk in the form of a snapshot, and the actual operation is executed through the fork child process and stored in binary compression. AOF records every write or delete operation processed by Redis in the form of a text log.
RDB saves the data of the entire Redis in a single file, which is suitable for disaster recovery, but the disadvantage is that if the snapshot goes down before the snapshot is saved, the data will be lost during this period of time. In addition, saving the snapshot may cause the service to be unavailable for a short time.
AOF uses the append mode for writing log files, has a flexible synchronization strategy, and supports synchronization per second, synchronization and non-synchronization for each modification. The disadvantage is that for datasets of the same size, AOF is larger than RDB,AOF and tends to be slower than RDB in running efficiency.
Let's go to this chapter for details, especially the advantages and disadvantages of the two, and how to choose.
"hanging interviewer" series-Redis Sentinel, persistence, Master-Slave, hand-torn LRU
High availability
Take a look at the high availability of Redis. Redis supports master-slave synchronization, provides Cluster cluster deployment mode, and monitors the status of Redis master servers through Sentine l sentinels. When the master dies, select the new master in the slave node according to a certain strategy, and adjust the other slave slaveof to the new master.
To put it simply, there are three strategies for choosing the master:
The lower the priority setting of slave, the higher the priority.
In the same situation, the more data slave replicates, the higher the priority.
Under the same conditions, the smaller the runid, the easier it is to be selected.
In the Redis cluster, sentinel will also deploy multiple instances, and the sentinel will ensure its high availability through the Raft protocol.
Using the slicing mechanism, Redis Cluster is internally divided into 16384 slot slots, distributed on all master nodes, and each master node is responsible for part of the slot. When operating the data, press key to do CRC16 to calculate which slot is located and which master is used for processing. The redundancy of data is guaranteed by slave nodes.
Sentinel
The Sentinel must use three instances to ensure his robustness. Sentinel + Master and Slave does not guarantee that the data will not be lost, but it can ensure the high availability of the cluster.
Why is it necessary to have three instances? Let's see what happens to the two sentinels.
Master downtime S1 and S2 Sentinels switch as long as one thinks you are down, and will elect a sentry to perform the failure, but this time also requires that most Sentinels are running.
So what's wrong with that? M1 crashed, S1 didn't hang up, that's actually OK, but the whole machine is down? The Sentinel has only S2 naked dicks left, and there is no Sentinel to allow the failover, although there is R1 on the other machine, but the failover is not performed.
The classic Sentinel cluster goes like this:
The machine where M1 is located is down, and there are still two sentinels. As soon as the two people see that he is dead, then we should just elect one to perform the failover.
Warm man, a small summary of the main functions of the Sentinel component:
Cluster monitoring: responsible for monitoring whether Redis master and slave processes are working properly.
Message notification: if a Redis instance fails, the sentry is responsible for sending a message to the administrator as an alarm notification.
Failover: if the master node dies, it will be automatically transferred to the slave node.
Configuration Center: if a failover occurs, notify the client client of the new master address.
Principal and subordinate
When it comes to this, it has a close relationship with the data persistence RDB and AOF I mentioned earlier.
Let me first talk about why we should use the master-slave architecture model. I mentioned earlier that there is an upper limit for stand-alone QPS, and the characteristics of Redis must support high concurrency of reading. Then you can read and write on a machine. Who can withstand this? not a human being! But isn't it much better if you ask this master machine to write and synchronize the data to other slave machines, and they all read it and distribute a large number of requests? moreover, horizontal expansion can be easily achieved when expanding.
When you start a slave, it will send a psync command to master, and if the slave connects to master for the first time, it will trigger a full copy. Master will start a thread, generate a snapshot of RDB, and cache all new write requests in memory. After the RDB file is generated, master will send the RDB to slave. The first thing slave does after getting it is to write it to the local disk and load it into memory, and then master will send all the new names cached in memory to slave.
After I posted it, the netizen from CSDN: Jian_Shen_Zer asked a question:
When the master and slave are synchronized, the new slaver comes in with RDB. What about the data after that? How can new data be synchronized to slaver when entering master?
Ao Bing answer: stupid, AOF, incremental just like MySQL's Binlog, just synchronize the log increment to the slave service
Key failure mechanism
The key of Redis can set the expiration time. After expiration, Redis adopts the failure mechanism of the combination of active and passive. One is to trigger passive deletion during access like MC, and the other is to delete actively on a regular basis.
Regular + inertia + memory elimination
Cache FAQ
Cache update mode
This is an issue that you should consider when you decide to use caching.
Cached data needs to be updated when the data source changes, which may be a DB or a remote service. The way to update can be active update. When the data source is DB, you can update the cache directly after updating the DB.
When the data source is not DB but other remote services, it may not be able to actively perceive data changes in a timely manner. In this case, you will generally choose to set the expiration period for cached data, that is, the maximum tolerance time for data inconsistencies.
In this scenario, you can choose invalid update. When key does not exist or expires, request the data source to get the latest data, then cache it again, and update the expiration period.
However, there is a problem with this: if the dependent remote service has an exception during the update, it will make the data unavailable. The improvement is asynchronous update, that is, when the failure does not clear the data, continue to use the old data, and then the asynchronous thread to perform the update task. In this way, the window period at the moment of failure is avoided. In addition, there is a pure asynchronous update method, which updates the data in batches on a regular basis. In actual use, you can choose the update method according to the business scenario.
Data inconsistency
The second problem is the problem of data inconsistency. It can be said that as long as you use caching, you should consider how to deal with this problem. Cache inconsistency is usually caused by the failure of active update, such as the timeout of request for updating Redis due to network reasons after updating DB, or the failure of asynchronous update.
The solution is that if the service is not particularly sensitive to time-consuming, you can increase retries; if the service is time-sensitive, you can handle failed updates through asynchronous compensation tasks, or short-term data inconsistencies will not affect the business. then as long as the next update can be successful, the final consistency can be guaranteed.
Cache penetration
Cache penetration. This problem may be caused by external malicious attacks, such as caching user information, but malicious attackers use the non-existent user id frequent request interface, causing the query cache to miss, and then penetrating the DB query still misses. At this point, there will be a large number of requests to access DB through the cache.
The solution is as follows.
For users who do not exist, keep an empty object in the cache for marking to prevent the same ID from accessing the DB again. However, sometimes this method does not solve the problem very well and may result in a large amount of useless data being stored in the cache.
Using the BloomFilter filter, BloomFilter is characterized by existence detection. If it does not exist in BloomFilter, then the data must not exist; if it exists in BloomFilter, the actual data may not exist. It is very suitable for solving this kind of problem.
Cache breakdown
Cache breakdown means that when a hot spot data fails, a large number of requests for that data will penetrate to the data source.
There are the following ways to solve this problem.
Mutex updates can be used to ensure that there are no concurrent requests to DB for the same data in the same process, thus reducing the pressure on DB.
Using random Backoff method, random sleep for a short time in case of failure, query again, and update again if it fails.
In order to solve the problem that multiple hot key fails at the same time, a fixed time plus a small random number can be used when caching to avoid a large number of hot key invalidation at the same time.
Cache avalanche
The cause of the cache avalanche is that the cache is hung, and all requests will penetrate to the DB.
Solution:
Use the fast failure circuit breaker strategy to reduce the instantaneous pressure of DB
Master-slave mode and cluster mode are used to ensure the high availability of cache services as much as possible.
In a real-world scenario, these two methods are combined.
Old friends all know why I don't have a lot of space to introduce these points. My previous article is so detailed that I can't help but like it, so I won't repeat it here.
"hanging the interviewer" series-Redis Foundation
"hanging interviewer" series-cache avalanche, breakdown, penetration
"hanging interviewer" series-Redis Sentinel, persistence, Master-Slave, hand-torn LRU
"hanging the interviewer" series-Redis final chapter-winter is coming, FPX- new Wang ascends the throne
Test points and additional items
Take notes!
Test site
Ask you about caching during the interview, mainly to examine the understanding of caching characteristics and the mastery of the characteristics and usage of MC and Redis.
To know the usage scenarios of caching, how different types of caches are used, for example:
-caching DB hotspot data to reduce DB pressure; caching dependent services to improve concurrency performance
-MC can be used for scenarios cached only by Kmurv, while special data formats such as list and set need to be cached, and Redis can be used.
-when you need to cache a list of users' most recent played videos, you can use Redis's list to save. When you need to calculate the ranking data, you can use Redis's zset structure to save it.
To understand the common commands of MC and Redis, such as atomic addition and subtraction, commands that operate on different data structures, and so on.
It is helpful to understand the storage structure of MC and Redis in memory, which is helpful to evaluate usage capacity.
Understand the data failure modes and culling strategies of MC and Redis, such as periodic culling triggered actively and delayed culling triggered passively.
Understand the principles of Redis persistence, master-slave synchronization, and Cluster deployment, such as the implementation and differences between RDB and AOF.
Know the similarities and differences of cache penetration, breakdown, and avalanche, as well as solutions.
Whether you have e-commerce experience or not, I think you should know the specific implementation of the second kill, as well as the details.
…… ..
Add item
If you want to do better in the interview, you should also know the following additions.
The purpose of this paper is to introduce the use of caching in combination with practical application scenarios. For example, when calling the backend service API to obtain information, you can use local + remote multi-level cache; for the scenario of dynamic ranking class, you can consider implementing it through Redis's Sorted set, and so on.
You'd better have experience in overly distributed cache design and use, such as what scenarios have used Redis in your project, what data structures have been used, and what kind of problems have been solved; when using MC, adjust McSlab allocation parameters according to the estimated size, and so on.
It is best to understand the problems that may arise in the use of caches. For example, Redis processes requests in a single thread, so you should try to avoid time-consuming single request tasks to prevent interaction; Redis services should avoid being deployed on the same machine as other CPU-intensive processes; or Swap memory swapping should be disabled to prevent Redis's cached data from being exchanged to the hard disk, affecting performance. For example, the calcification of MC mentioned earlier, and so on.
To understand the typical Redis application scenarios, for example, using Redis to implement distributed locks, using Bitmap to implement BloomFilter, using HyperLogLog for UV statistics, and so on.
Know the new features in Redis4.0, 5.0. for example, the persistence message queuing Stream; that supports multicast can be customized through the Module system, and so on.
This is the end of the article on "what are the interview questions for Redis". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.