Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the high-frequency interview questions in Redis

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what are the high-frequency interview questions in Redis. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

Why is Redis so fast?

A lot of people only know that it is Khammov NoSQl in-memory database, single thread. It is the lack of a comprehensive understanding of Redis that makes it impossible to ask further questions.

This problem is a basic understanding, we can from the Redis different data types of the underlying data structure implementation, completely based on memory, IO multiplexing network model, thread model, progressive rehash … ...

How fast is it?

We can first say how fast, according to official data, Redis's QPS can reach about 100000 (requests per second), interested can refer to the official benchmark test "How fast is Redis?" Address: redis.io/topics/benc...

The horizontal axis is the number of connections and the vertical axis is QPS.

This picture reflects an order of magnitude, through quantification to make the interviewer feel that you have read the official documents, very rigorous.

Memory-based implementation

Redis is a memory-based database, compared with the disk database, the speed of completely hoisting the disk.

Both read and write operations are done in memory, so let's compare the differences between memory operations and disk operations.

Disk call

Memory operation

The memory is directly controlled by the CPU, that is, the memory controller integrated within the CPU, so the memory is directly connected with the CPU and enjoys the optimal bandwidth to communicate with the CPU.

Finally, a graph is used to quantify the delay time of the system (some data refer to Brendan Gregg).

Efficient data structure

When learning MySQL, I know that B + Tree data structure is used to improve the retrieval speed, so the speed of Redis should also be related to the data structure.

There are five data types in Redis-String, List, Hash, Set and SortedSet.

Different data types are supported by one or more data structures in order to pursue faster speed.

Brother Ma sent a message: we can separately explain the advantages of the underlying data structure of each data type, many people only know the data type, and saying the underlying data structure can make people bright.

Advantages of SDS simple dynamic string

Len in SDS saves the length of the string, and O (1) time complexity queries the string length information.

Pre-allocation of space: after the SDS is modified, the program allocates not only the necessary space for the SDS, but also additional unused space.

Lazy space release: when shortening the SDS, the program does not reclaim the excess memory space, but uses the free field to record the number of bytes without releasing it. Later, if the append operation is needed, the unused space in the free is directly used, reducing the allocation of memory.

ZipList compressed list

Compressed list is one of the underlying implementations of three data types: List, hash, and sorted Set.

When a list has only a small amount of data, and each list item is either a small integer value or a short string, Redis uses a compressed list as the underlying implementation of the list key.

In this way, the memory is compact and memory is saved.

Quicklist

The list data structure has been modified in subsequent versions, using quicklist instead of ziplist and linkedlist.

Quicklist is a mixture of ziplist and linkedlist, it splits linkedlist into segments, each segment is stored compactly using ziplist, and multiple ziplist are concatenated by two-way pointers.

SkipList jump table

The sorting function of the sorted set type is achieved through the "jump list" data structure.

Jump table (skiplist) is an ordered data structure, which can quickly access nodes by maintaining multiple pointers to other nodes in each node.

On the basis of the linked list, the jump table adds a multi-level index to quickly locate the data through several jumps of the index position, as shown in the following figure:

Integer array (intset)

When a collection contains only integer-valued elements, and the number of elements in the collection is small, Redis uses the collection of integers as the underlying implementation of the set key, saving memory.

Single thread model

We need to note that Redis's single thread refers to Redis's network IO (after version 6.x, network IO uses multithreading) and key-value pair instructions are read and written by a single thread. Redis persistence, cluster data synchronization, asynchronous deletion, and so on are all performed by other threads.

Never say that Redis has only one thread.

Single thread means that the Redis key value is a single thread for the execution of read and write instructions.

Let's start with the official answer, which makes people feel rigorous enough, rather than following others to recite some blogs.

Official answer: because Redis is a memory-based operation, CPU is not the bottleneck of Redis, and the bottleneck of Redis is most likely to be the size of machine memory or network bandwidth. Since single-threading is easy to implement and CPU will not be a bottleneck, it makes sense to adopt a single-threaded solution. Original address: redis.io/topics/faq.

Why not use multithreaded execution to take full advantage of CPU?

Before running each task, CPU needs to know where the task is loaded and started running. That is, the system needs to help it preset CPU registers and program counters, which is called the CPU context.

When we switch contexts, we need to complete a series of work, which is a very resource-consuming operation.

The introduction of multithreaded development requires the use of synchronization primitives to protect the concurrent reading and writing of shared resources and increase the complexity of code and the difficulty of debugging.

What are the benefits of single threading?

No performance consumption due to thread creation

Avoid CPU consumption caused by context switching, and there is no overhead of multi-thread switching.

It avoids the competition between threads, such as adding locks, releasing locks, deadlocks, etc., and does not need to consider all kinds of lock problems.

The code is clearer and the processing logic is simple.

Ipaw O multiplexing model

Redis adopts Icano multiplexing technology to deal with connections concurrently. A simple event framework implemented by epoll + is adopted.

Read, write, close, and connect in epoll are all converted into events, and then take advantage of the multiplexing feature of epoll to never waste any time on IO.

The Redis thread does not block on a particular listening or connected socket, that is, it does not block on a particular client request processing. Because of this, Redis can connect to multiple clients and process requests at the same time, thereby improving concurrency.

Redis Global hash Dictionary

Redis as a whole is a hash table that holds all key-value pairs, regardless of whether the data type is any of five. A hash table is essentially an array in which each element is called a hash bucket, and regardless of the data type, the entry in each bucket holds a pointer to the actual specific value.

The time complexity of the hash table is O (1). You only need to calculate the hash value of each key to know the location of the corresponding hash bucket, and locate the entry in the bucket to find the corresponding data, which is also one of the reasons why Redis is fast.

Redis uses objects (redisObject) to represent keys in the database. When we create a key-value pair in Redis, at least two objects are created, one is the key object used as the key-value pair, and the other is the value object of the key-value pair.

That is, each entry holds the redisObject object of the key-value pair, and the corresponding data is found through the pointer of the redisObject.

Typedef struct redisObject {/ / type unsigned type:4; / / Encoding unsigned encoding:4; / / pointer to the underlying data structure void * ptr; / /.} robj; copy code Hash conflict?

Redis resolves conflicts through chained hashes: that is, elements in the same bucket are saved using linked lists. However, when the linked list is too long, it may lead to poor lookup performance, so Redis uses two global hash tables in pursuit of speed. Used for rehash operations to increase the number of existing hash buckets and reduce hash conflicts.

Start using "hash Table 1" by default to save key-value pair data, "hash Table 2" has no space allocated at this time. When more and more data triggers the rehash operation, do the following:

Allocate more space to hash Table 2

Remap the data from "hash Table 1" to "hash Table 2"

Free up space in hash Table 1.

It is worth noting that the remapping of data from hash Table 1 to hash Table 2 is not an one-off process, which results in Redis blocking and failure to provide services.

Instead, it uses a progressive rehash. Each time you process a client request, start with the first index in "hash Table 1" and copy all the data in this location to "hash Table 2". In this way, the rehash is distributed to multiple requests to avoid time-consuming blocking.

How does Redis persist? How to recover data after downtime?

Redis's data persistence uses "RDB data snapshots" to achieve fast recovery from downtime. However, if full data snapshots are performed too frequently, there are two serious performance overhead:

Frequently generate RDB files to write to disk, disk pressure is too high. It will appear that the last RDB is not finished, and the next one starts to be generated, falling into an endless loop.

The fork bgsave child process blocks the main thread, and the larger the memory of the main thread, the longer the blocking time.

So Redis also designed the instruction record of AOF post-write logging to modify memory.

Interviewer: what is a RDB memory snapshot?

As Redis executes the write instruction, the in-memory data changes all the time. The so-called memory snapshot refers to the state data of the data in Redis memory at some point.

For example, when the time is fixed at a certain moment, when we take a picture, we can completely record the instantaneous picture of a certain moment through the photo.

Redis is similar to this in that it takes pictures of the data at a certain moment in the form of a file and writes it to disk. This snapshot file is called the RDB file, and RDB is the abbreviation of Redis DataBase.

When doing data recovery, read the RDB file directly into memory to complete the recovery.

Interviewer: can Redis process write requests at the same time during the generation of RDB?

Yes, Redis uses the operating system's multi-process write-time replication technology COW (Copy On Write) to achieve snapshot persistence and ensure data consistency.

During persistence, Redis calls glibc's function fork to generate a child process, snapshot persistence is completely handed over to the child process, and the parent process continues to process client requests.

When the main thread executes a write instruction to modify the data, the data is copied, and the bgsave child process reads the copy data and writes it to the RDB file.

This not only ensures the integrity of the snapshot, but also allows the main thread to modify the data at the same time, avoiding the impact on normal business.

Interviewer: what is AOF?

The AOF log records all the modification instructions since the creation of the Redis instance, so you can restore the state of the in-memory data structure of the current Redis instance by executing all the instructions sequentially on an empty Redis instance, that is, "replay".

The AOF configuration item appendfsync writeback policy provided by Redis directly determines the efficiency and security of AOF persistence.

Always: write back synchronously, and write the contents of the aof_buf buffer to the AOF file immediately after the write instruction is executed.

Everysec: write back every second, when the write instruction is executed, the log is only written to the AOF file buffer, and the contents of the buffer are synchronized to disk every other second.

No: operating system control, write execution is completed, write the log to the AOF file memory buffer, and it is up to the operating system to decide when to write to disk.

There is no strategy of having the best of both worlds, and we need to make a trade-off between performance and reliability.

Interviewer: since RDB has two performance issues, why not use AOF?

The AOF pre-write log records each "write" instruction operation. It does not cause performance loss like RDB full snapshots, but it is not as fast as RDB, and too large log files can cause performance problems.

So, Redis designed a killer "AOF rewriting mechanism", and Redis provided bgrewriteaof instructions to slim down AOF logs.

The principle is to open up a child process to traverse the memory and convert it into a series of Redis operation instructions, which are serialized into a new AOF log file. After serialization is completed, the incremental AOF logs that occur during the operation are appended to the new AOF log file, and the old AOF log file is replaced immediately after the append is completed, and the slimming work is completed.

Interviewer: how to achieve as little data loss as possible while taking into account performance?

When restarting Redis, we rarely use rdb to restore the memory state because a large amount of data is lost. We usually use AOF log playback, but the performance of replaying AOF logs is much slower than rdb, so it takes a long time to start when the Redis instance is large.

To solve this problem, Redis 4.0introduced a new persistence option-hybrid persistence. Store the contents of the rdb file with the incremental AOF log file. The AOF log here is no longer a full log, but an incremental AOF log that occurs during the period from the beginning of the persistence to the end of the persistence, which is usually very small.

Therefore, when Redis is restarted, the contents of rdb can be loaded first, and then the incremental AOF logs can be replayed, which can completely replace the previous AOF full file replay, thus greatly improving the restart efficiency.

Redis master-slave architecture data synchronization

Redis provides a master-slave mode in which a redundant copy of the data is replicated to another Redis server through master-slave replication.

Interviewer: how to ensure the consistency of data between master and slave?

In order to ensure the consistency of the replica data, the master-slave architecture adopts the method of separation of read and write.

Read operation: both master and slave libraries can be executed

Write operation: the master library executes first, and then synchronizes the write operation to the slave library

Interviewer: is there any other function of master-slave replication?

Fault recovery: when the primary node is down, other nodes can still provide services

Load balancing: Master nodes provide write services, and Slave nodes provide read services to share the pressure.

High availability Cornerstone: is the foundation of Sentinel and cluster implementation, is the high availability cornerstone.

Interviewer: how is master-slave replication realized?

Synchronization is divided into three situations:

The first full copy of the master-slave library

Synchronization during normal operation of master and slave

Disconnect and reconnect synchronization between master and slave libraries.

Interviewer: how to achieve synchronization for the first time?

The first replication process of the master-slave library can be divided into three stages: the connection establishment stage (that is, the preparation stage), the master library synchronization data to the slave library stage, and the sending of new write commands during synchronization to the slave library stage.

Establish a connection: the slave library will establish a connection with the master database, execute replicaof and send psync commands to the slave library and tell the master database that synchronization is imminent. After the master database confirms the reply, the master and slave libraries begin to synchronize.

The master library synchronizes the data to the slave library: master executes the bgsave command to generate the RDB file and sends the file to the slave library. At the same time, the master library opens a replication buffer buffer for each slave to record all write commands received from the generation of the RDB file. Save the RDB from the library and empty the database and load the RDB data into memory.

The new write command received after sending RDB to the slave library: the write operation after generating the RDB file is not recorded in the RDB file just now. In order to ensure the consistency of the master and slave library data, the master library will use a replication buffer in memory to record all write operations after the RDB file is generated. And send the data to slave.

Interviewer: what if the network between the master and slave libraries is cut off? Do you want to copy all of it again after disconnection?

Before Redis 2.8.If the master and slave library had a network flash when the command was propagated, the slave library would make a full copy with the master library again, which was very expensive.

Starting from Redis 2.8.After the network is down, the master-slave library will continue to synchronize by means of incremental replication.

Incremental replication: for replication after a network interruption, only the write commands executed by the master node during the interruption are sent to the slave node, which is more efficient than full replication.

The secret of disconnecting and reconnecting incremental copy is the repl_backlog_buffer buffer. No matter when master records write instructions in repl_backlog_buffer, because memory is limited, repl_backlog_buffer is a fixed-length circular array. If the array is full, it will overwrite the previous content from scratch.

Master uses master_repl_offset to record the position offset it writes to, and slave uses slave_repl_offset to record the offset it has read.

When the master and slave are disconnected and reconnected, slave first sends a psync command to master and sends its own runID,slave_repl_offset to master.

Master only needs to synchronize the commands between master_repl_offset and slave_repl_offset to the slave library.

The incremental copy execution process is shown below:

Interviewer: after completing the full synchronization, how to synchronize the data during normal operation?

When the master and slave libraries complete full replication, a network connection is maintained between them, and the master library synchronizes subsequent command operations to the slave database through this connection. This process is also known as command propagation based on persistent connections. The purpose of using persistent connections is to avoid the overhead caused by frequent connections.

A series of questions about the Sentinel principle

Interviewer: yes, I know so much. Do you know the principle of Sentinel Cluster?

Sentinel is an operation mode of Redis, which focuses on monitoring the running status of Redis instances (master node and slave node), and can select master and master-slave switch through a series of mechanisms when the master node fails, so as to ensure the availability of the whole Redis system.

His architecture is as follows:

The Redis Sentinel has the following abilities:

Monitoring: continuously monitor whether master and slave are in the expected working state.

Automatically switch the main library: when the Master fails, the Sentinel starts the automatic recovery process: select one from the slave as the new master.

Notification: let slave execute the replicaof, synchronize with the new master, and notify the client to establish a connection with the new master.

Interviewer: how do sentinels know each other?

The Sentinel establishes communication with master and uses master to provide publish / subscribe mechanisms to publish his own information, such as height and weight, whether single, IP, port.

Master has a dedicated channel of _ _ sentinel__:hello for publishing and subscribing messages between Sentinels. This is like a _ _ sentinel__:hello WeChat group. Sentinels use the WeChat group established by master to publish their own messages while following the messages posted by other Sentinels.

Interviewer: although the Sentinels have established a connection, they still need to establish a connection with slave, otherwise they cannot be monitored. How do you know about slave and monitor them?

The key is to use master to achieve, the sentry sends INFO commands to master, and the master boss naturally knows all the salve boys under his door. So when master receives the command, he tells the sentry the slave list.

The Sentinel establishes a connection with each salve based on the slave list information responded by the master, and continuously monitors the Sentinel based on this connection.

Cluster Cluster chain Gun

Interviewer: are there any other highly available means besides sentinels?

There is a Cluster cluster to achieve high availability, and the Redis cluster monitored by the Sentinel cluster is a master-slave architecture and cannot be expanded very much. The use of Redis Cluster cluster mainly solves all kinds of slow problems caused by large amount of data storage, and it is also convenient for horizontal expansion.

When facing millions and tens of millions of users, a scale-out Redis slicing cluster will be a very good choice.

Interviewer: what is a Cluster cluster?

Redis cluster is a distributed database scheme, which manages data through sharding (a practice of divide-and-conquer idea) and provides replication and failover functions.

The data is divided into 16384 slots, and each node is responsible for part of the slot. The information of the slot is stored in each node.

It is decentralized, as shown in the figure, the cluster consists of three Redis nodes, each node is responsible for part of the data of the entire cluster, and each node may be responsible for more or less different data.

The three nodes are connected to each other to form a peer-to-peer cluster, and they exchange cluster information with each other through the Gossip protocol. Finally, each node stores the slots allocation of other nodes.

Interviewer: how do hash slots map to Redis instances?

According to the key of the key-value pair, the CRC16 algorithm is used to calculate a value of 16 bit.

The value of 16 bit is modulated to 16384, and the hash slot corresponding to key is obtained with numbers from 0 to 16383.

Locate the corresponding instance according to the slot information.

The mapping of key values to data, hash slot and Redis instances is as follows:

Interviewer: how does Cluster fail over?

Redis cluster nodes use Gossip protocol to broadcast their own state and their cognitive changes to the whole cluster. For example, if a node finds that a node has lost contact (PFail), it will broadcast this message to the entire cluster, and other nodes can receive this missing message.

If a node receives that the number of missing nodes (PFail Count) has reached most of the cluster, it can mark the node as determined offline status (Fail), and then broadcast to the entire cluster, forcing other nodes to accept the fact that the node has gone offline, and immediately switch between master and slave to the missing node.

Interviewer: how does the client determine which instance the accessed data is distributed on?

Redis instances send their own hash slot information to other instances in the cluster through Gossip protocol, thus realizing the diffusion of hash slot allocation information.

In this way, each instance in the cluster has all the mapping information between the hash slot and the instance.

When the client connects to any instance, the instance responds to the mapping between the hash slot and the instance to the client, and the client caches the hash slot and instance mapping information locally.

When the client requests, it calculates the hash slot corresponding to the key, locates the mapping information to the data instance through the locally cached hash slot instance, and then sends the request to the corresponding instance.

Interviewer: what is the Redis redirection mechanism?

The mapping between the hash slot and the instance has changed due to the addition of the instance or the redistribution of the load balancer. The client sends the request to the instance. The instance has no corresponding data, and the Redis instance will tell the client to send the request to another instance.

Redis tells the client through MOVED errors and ASK errors.

MOVED

MOVED error (load balancer, data has been migrated to other instances): when the client sends an operation request for a key-value pair to an instance, and the slot of the key is not its own responsibility, the instance will return a MOVED error directing it to the node in charge of the slot.

At the same time, the client updates the local cache to update the correspondence between the slot and the Redis instance correctly.

ASK

If a slot has a lot of data, some of it will be migrated to the new instance, and some of it will not be migrated.

If the requested key is found in the current node, execute the command directly, otherwise you will need an ASK error response.

If the migration of slots is not completed, if the Slot of the key you need to access is being migrated from instance 1 to instance 2 (if the key is no longer in instance 1), instance 1 will return an ASK error message on the client: the hash slot of the key requested by the client is being migrated to instance 2. Send an ASKING command to instance 2, and then send an operation command.

For example, if the client requests to locate slot 16330 of key = "official account: codebyte" on instance 172.17.18.1, node 1 executes the command directly if it can find it, otherwise it responds to the ASK error message and directs the client to the target node 172.17.18.2 being migrated.

Note: the ASK error instruction does not update the hash slot allocation information cached by the client.

This is the end of the article on "what are the high-frequency interview questions in Redis". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report