What are the interview questions of Redis distributed technology? 07/03 Update SLTechnology News&Howtos

What are the interview questions of Redis distributed technology?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces the relevant knowledge of "what are the interview questions of Redis distributed technology". The editor shows you the operation process through an actual case, and the operation method is simple, fast and practical. I hope that this article "what are the interview questions of Redis distributed technology" can help you solve the problem.

1. Distributed cache 1.1. What data types does Redis have? What scenarios are they used for? Values that can be stored by data types operate on STRING strings, integers, or floating point numbers on the entire string or part of a string

Perform self-increment or self-subtraction operations on integers and floating-point numbers LIST list presses or pops elements from both ends

Read single or multiple elements

Pruning, leaving only one range of elements SET unordered collection to add, get, remove a single element

Check whether an element exists in the collection

Calculate intersection, union, difference

Randomly get elements from the collection HASH contains an unordered hash table containing key-value pairs to add, get, and remove a single key-value pair

Get all key-value pairs

Check whether a key exists ZSET ordered collection to add, get, delete elements

Get the element based on the score range or member

Calculate the ranking of a key

What Redis data structures look like

1.2. How is the master-slave replication of Redis implemented?

Connect to the master server from the slave server and send the SYNC command

After receiving the SYNC naming, the master server starts executing the BGSAVE command to generate the RDB file and uses a buffer to record all write commands executed thereafter

After the master server BGSAVE executes, it sends snapshot files to all slave servers and continues to record the write commands executed during the sending period.

After receiving the snapshot file from the server, discard all old data and load the received snapshot

After the master server snapshot is sent, it begins to send write commands in the buffer to the slave server.

Finish loading the snapshot from the server, receive command requests, and execute write commands from the autonomous server buffer

1.3. How is Redis's key addressed? Background

(1) each database in redis is stored in a redisDb structure. Where:

RedisDb.id stores numbers represented by integers in the redis database.

RedisDb.dict stores all the key-value pair data in the library.

RedisDb.expires holds the expiration time of each key.

(2) when the redis server is initialized, 16 databases are pre-allocated (this number can be configured through the configuration file), and all databases are saved to a member redisServer.db array of the structure redisServer. When we select the database select number, the program switches the database directly through redisServer.db [number]. Sometimes when a program needs to know which database it is in, it can just read redisDb.id.

(3) the dictionary of redis uses hash table as its underlying implementation. The dict type uses two pointers to the hash table, of which the hash table No. 0 (ht [0]) is mainly used to store all the key values of the database, while the hash table No. 1 is mainly used for the program to rehash the hash table No. 0. Rehash is usually triggered when a new value is added, which will not be overstated here. So finding a key in redis is actually a lookup operation for ht [0] in the dict structure.

(4) since it is a hash, then we know that there will be a hash collision, so what should we do when multiple keys hash for the same value? Redis uses a linked list to store multiple hash collided keys. That is, when the list is found based on the hash value of key, if the length of the list is greater than 1, then we need to traverse the linked list to find the key we are looking for. Of course, the length of the linked list is generally 1, so the time complexity can be regarded as o (1).

Steps to address key

After getting a key, redis first determines whether the hash table 0 of the current library is empty, that is, if (dict- > ht [0] .size = = 0). Return NULL directly for true.

Determine whether the hash table 0 requires rehash, because if rehash is in progress, it is possible for either of the two tables to store the key. If rehash is in progress, the _ dictRehashStep method is called once, and _ dictRehashStep is used to passively rehash the database dictionary, as well as the dictionary of the hash key, which is not discussed here.

The hash table is calculated, and the hash value is calculated according to the current dictionary and key.

The index value of the hash table is calculated based on the hash value and the current dictionary.

Take out the linked list in the hash table according to the index value, and traverse the linked list to find the location of the key. In general, the linked list is 1 in length.

When the ht [0] has been searched, the rehash judgment is made again. If it is not in the rehashing, it ends directly, otherwise, the 345th step is repeated for the ht [1].

1.4. How is the cluster mode of Redis implemented?

Redis Cluster is Redis's distributed solution, which was officially launched in Redis version 3.0.

Redis Cluster is decentralized, each node holds data and the entire cluster state, and each node is connected to all other nodes.

Redis Cluster node assignment

Redis Cluster features:

All redis nodes are interconnected with each other (PING-PONG mechanism), and binary protocols are used internally to optimize transmission speed and bandwidth.

The fail of a node takes effect only when it is detected by more than half of the nodes in the cluster.

The client is directly connected to the redis node and does not require an intermediate proxy layer. The client does not need to connect to all the nodes in the cluster, but to any of the available nodes in the cluster.

Redis-cluster maps all physical nodes to [0-16383] hash slots (hash slot) (not necessarily evenly distributed), and cluster is responsible for maintaining node, slot, and value.

The Redis cluster is pre-divided into 16384 buckets. When you need to place a key-value in the Redis cluster, decide which bucket to put a key into according to the value of CRC16 (key) mod 16384.

Redis Cluster master-slave mode

In order to ensure the high availability of data, Redis Cluster adds the master-slave mode.

A master node corresponds to one or more slave nodes, the master node provides data access, and the slave node pulls data backup from the master node. When the master node dies, the slave node selects one to act as the master node to ensure that the cluster does not die. Therefore, when the cluster is established, be sure to add a slave node for each master node.

Redis Sentinel

Redis Sentinel is used to manage multiple Redis servers, and it has three functions:

Monitoring-Sentinel will constantly check whether your master server and slave server are working properly.

Notification-Sentinel can send notifications to administrators or other applications through API when there is a problem with a monitored Redis server.

Automatic failover (Automatic failover)-when a master server does not work properly, Sentinel starts an automatic failover operation, which upgrades one of the slaves of the failed master server to the new master server and allows the other slave servers of the failed master server to replicate the new master server When the client tries to connect to the failed primary server, the cluster also returns the address of the new primary server to the client, so that the cluster can use the new primary server instead of the failed server.

There should be an odd number of nodes in the Redis cluster, so there are at least three nodes.

When the primary server in the Sentinel monitoring cluster fails, a sentry needs to be elected according to the quorum to perform the failover. The election requires majority, that is, the majority=2 of most sentinels running (majority=2,3 of 2 sentinels, majority=2,5 of majority=3,4 sentinels).

Suppose the cluster deploys only 2 nodes

+-+ +-- + | M1 |-| R1 | | S1 | | S2 | +-+-+

If the M1 and S1 servers are down and there is only one sentry that cannot satisfy the majority for the election, the failover cannot be performed.

1.5. How does Redis implement distributed locks? How does ZooKeeper implement distributed locks? Compare the advantages and disadvantages of the two?

Three implementations of distributed locks:

Implementation of distributed Lock based on Database

Implementation of distributed lock based on cache (Redis, etc.)

Implementation of distributed Lock based on Zookeeper

Database implementation with Redis

When acquiring a lock, use setnx to add a lock, and use the expire command to add a timeout for the lock, after which the lock is automatically released. The value value of the lock is a randomly generated UUID, which can be judged when the lock is released.

When acquiring the lock, you also set a timeout for the acquisition, and if it exceeds this time, the acquisition lock is discarded.

When releasing the lock, the UUID determines whether it is the lock or not, and if it is the lock, delete is executed to release the lock.

ZooKeeper implementation

Create a directory mylock

Thread A wants to acquire the lock by creating a temporary sequential node under the mylock directory.

Get all the child nodes in the mylock directory, and then get the sibling nodes smaller than yourself. If they do not exist, the current thread has the lowest sequence number and gets the lock.

Thread B gets all the nodes, determines that it is not the smallest node, and sets the node that listens to the node that is smaller than itself.

Thread A finishes processing, deletes its own node, thread B listens for change events, determines whether it is the smallest node, and if so, acquires the lock.

Achieve comparison

ZooKeeper has the characteristics of high availability, reentrant and blocking lock, which can solve the problem of failure deadlock. However, because ZooKeeper needs to create and delete nodes frequently, its performance is not as good as that of Redis.

1.6. How to persist Redis? What are the advantages and disadvantages? Persistence implementation principle? RDB Snapshot (snapshot)

All the data that exists at a certain time is written to the hard disk.

The principle of snapshot

By default, Redis saves the database snapshot in a binary file named dump.rdb. You can set Redis to save the dataset automatically when the condition of "at least M changes to the dataset in N seconds" is met. You can also manually let Redis save the dataset by calling SAVE or BGSAVE. This persistence is called a snapshot.

When Redis needs to save the dump.rdb file, the server does the following:

Redis creates a child process.

The child process writes the dataset to a temporary snapshot file.

When the child process finishes writing to the new snapshot file, Redis replaces the original snapshot file with the new snapshot file and deletes the old snapshot file.

This way of working allows Redis to benefit from the copy-on-write (copy-on-write) mechanism.

Advantages of Snapshot

It saves the dataset at a certain point in time and is very suitable for the backup of the dataset.

It is easy to transfer to another remote data center or Amazon's S3 (possibly encrypted), which is very suitable for disaster recovery.

The only thing the parent process needs to do when saving the RDB file is to fork out a child process, and all the subsequent work is done by the child process, and the parent process does not need to do other IO operations, so snapshot persistence can maximize the performance of redis.

Compared with AOF, the DB approach is faster when recovering large datasets.

Disadvantages of snapshots

Snapshots are not for you if you want to lose the least amount of data if redis stops working unexpectedly (such as a power outage).

Snapshots require frequent fork child processes to save datasets to the hard disk. When the dataset is large, the fork process is very time-consuming and may cause Redis to fail to respond to client requests for some milliseconds.

AOF

The AOF persistence method records each write to the server. When the server restarts, these commands are re-executed to restore the original data.

The principle of AOF

Redis creates a child process.

The child process starts writing the contents of the new AOF file to a temporary file.

For all newly executed write commands, the parent process appends these changes to the end of the existing AOF file as it accumulates them into an in-memory cache, so that the existing AOF file is safe even if a downtime occurs in the middle of the rewrite.

When the child process finishes rewriting, it sends a signal to the parent process, which, after receiving the signal, appends all data in the memory cache to the end of the new AOF file.

Got it! Now Redis atomically replaces the old file with the new file, after which all commands are appended directly to the end of the new AOF file.

Advantages of AOF

Using the default per second fsync policy, Redis still performs well (fsync is processed by background threads, and the main thread will try its best to handle client requests). In case of failure, you can lose up to 1 second of data with AOF.

The AOF file is an append-only log file, so you don't need to write to seek, even if you don't execute a complete write command for some reason (disk space is full, downtime during writing, etc.), you can use the redis-check-aof tool to fix these problems.

Redis can automatically rewrite AOF in the background when the AOF file becomes too large: the new AOF file after rewriting contains the minimum set of commands needed to recover the current dataset. The entire rewrite operation is absolutely safe.

The AOF file sequentially saves all writes to the database in the format of the Redis protocol. Therefore, the contents of the AOF file are very easy to read, and the parse of the file is also very easy.

Shortcomings of AOF

For the same dataset, the volume of the AOF file is usually larger than that of the RDB file.

Depending on the fsync policy used, AOF may be slower than snapshots. In general, the performance of fsync per second is still very high, and turning off fsync can make AOF as fast as snapshots, even under heavy loads. However, snapshots can provide a more guaranteed maximum latency (latency) when dealing with large write loads.

1.7. What are the expiration policies for Redis?

Noeviction-when memory usage reaches a threshold, all commands that cause requests for memory will report an error.

Allkeys-lru-in the primary key space, remove the recently unused key first.

Allkeys-random-in the primary key space, randomly remove a key.

Volatile-lru-in the key space where the expiration time is set, remove the recently unused key first.

Volatile-random-randomly removes a key in the key space where the expiration time is set.

Volatile-ttl-in the key space where the expiration time is set, the key with an earlier expiration time is removed first.

1.8. What's the difference between Redis and Memcached?

Both are non-relational in-memory key-value databases. There are the following main differences:

Data type

Memcached only supports string types

Redis supports five different data types, making it more flexible to solve problems.

Data persistence

Memcached does not support persistence

Redis supports two persistence strategies: RDB snapshots and AOF logs.

Distributed system

Memcached does not support distributed storage, so distributed storage can only be achieved by using distributed algorithms such as consistent hash on the client. In this way, the node where the data is located needs to be calculated on the client first when storing and querying.

Redis Cluster implements distributed support.

Memory management mechanism

Memcached completely solves the problem of memory fragmentation by dividing memory into blocks of specific length to store data, but this approach makes memory utilization low. For example, if the block size is 128 bytes, only 100 bytes of data is stored, then the remaining 28 bytes is wasted.

In Redis, not all data is stored in memory all the time, and some long-unused value can be swapped to disk. Memcached's data will always be in memory.

1.9. Why is the Redis performance of single thread better than that of multi-thread Memcached?

Reasons why Redis is fast:

The vast majority of requests are pure memory operations (very fast)

Single thread is adopted to avoid unnecessary context switching and competition conditions.

Non-blocking IO

The internal implementation adopts epoll and adopts a simple event framework implemented by epoll+ itself. Read, write, close, and connect in epoll are all converted into events, and then take advantage of the multiplexing feature of epoll to never waste any time on io.

two。 Distributed message queuing (MQ) 2.1. Why use MQ?

Asynchronous processing-compared with the traditional serial and parallel methods, the system throughput is improved.

Application decoupling-systems communicate through messages without worrying about the processing of other systems.

Traffic sharpening-the number of requests can be controlled by message queue length; high concurrency requests can be alleviated in a short period of time.

Log processing-solves a large number of log transfers.

Message communication-message queues generally have built-in efficient communication mechanisms, so they can also be used for pure message communication. Such as implementing peer-to-peer message queues, or chat rooms, etc.

2.2. How to ensure the high availability of MQ? Data replication

Sort all Broker and Partition to be assigned

Assign the 1st Partition to the (i mod n) th Broker

Assign the j Replica of the I Partition to the ((I + j) mode n) Broker

Election main server 2.3. What are the common problems with MQ? How to solve these problems?

Common problems with MQ are:

The order of messages

The problem of message repetition

The order of messages

Message ordering means that it can be consumed according to the order in which the message is sent.

If the producer produces two messages: M1 and M2, suppose M1 is sent to S1Magol M2 and sent to S2, what if you want to ensure that M1 is consumed before M2?

Solution:

(1) to ensure that the producer-MQServer-consumer relationship is one-to-one.

Defect:

Parallelism will become the bottleneck of the messaging system (insufficient throughput).

More exception handling, for example, as long as there is a problem on the consumer side, it will cause the whole processing process to block, and we have to spend more energy to solve the blocking problem.

(2) to avoid through reasonable design or decomposing the problem.

There are a large number of applications that do not pay attention to disorder.

Queue disorder does not mean that messages are out of order

Therefore, it is a more reasonable way to ensure the order of messages from the business level rather than just relying on the message system.

The problem of message repetition

The root cause of message duplication is that the network is unreachable.

So the way to solve this problem is to bypass it. Then the question becomes: what should be done if the consumer receives two identical messages?

The business logic of the consumer side to process the message remains idempotent. As long as it is idempotent, no matter how many duplicate messages come, the final result will be the same. Ensure that each message has a unique number and that the successful message processing occurs at the same time as the log of the deduplicated table. Use a log table to record the ID of a message that has been successfully processed, and if the newly arrived message ID is already in the log table, the message is no longer processed.

2.4. What are the advantages and disadvantages of Kafka, ActiveMQ, RabbitMQ and RocketMQ?

3. Distributed Services (RPC) 3.1. The implementation process of Dubbo?

Node role:

Node role description Provider exposes the service provider of the service Consumer invokes the service consumer of the remote service Registry service registration and discovery registry Monitor statistics service invocation times and time of the monitoring center Container service running container

Invocation relationship:

The service container is responsible for starting, loading, and running the service provider.

Upon startup, the service provider registers the services it provides with the registry.

Service consumers subscribe to the services they need from the registry when they start up.

The registry returns a list of service provider addresses to the consumer, and if there is a change, the registry will push the change data to the consumer based on the persistent connection.

The service consumer, from the provider address list, chooses one provider to call based on the soft load balancing algorithm, and then chooses another one if the call fails.

Service consumers and providers accumulate the number of calls and call time in memory and regularly send statistics to the monitoring center every minute.

3.2. What are the Dubbo load balancing strategies?

Random

Random, set random probability by weight.

The probability of collision on a section is high, but the larger the amount of adjustment is, the more uniform the distribution is, and it is also more uniform after using the weight according to probability, which is beneficial to dynamically adjust the weight of the provider.

RoundRobin

The rate of rotation is set according to the weight after the convention.

There is a problem of slow provider accumulating requests, for example, the second machine is slow, but it doesn't hang up, it gets stuck when the request is transferred to the second machine, and over time, all requests are stuck on the second machine.

LeastActive

The minimum number of active calls, the random number of the same active number, and the active number refers to the difference in count before and after the call.

Causes slower providers to receive fewer requests, because the slower the provider, the greater the difference in count before and after invocation.

ConsistentHash

Consistent Hash, requests with the same parameters are always sent to the same provider.

When a provider hangs up, the request originally sent to that provider, based on the virtual node and spread equally to other providers, will not cause drastic changes.

By default, only the first parameter Hash. If you want to modify it, please configure the

160 virtual nodes are used by default. If you want to modify them, please configure

3.3. Dubbo Cluster Fault tolerance Strategy?

Failover-automatic switching when failure occurs. If failure occurs, retry other servers. It is usually used for read operations, but retry results in a longer delay. The number of retries can be set through retries= "2" (excluding the first time).

Failfast-Fast failure, only one call is made, and an error is reported immediately after the failure. It is usually used for non-idempotent write operations, such as adding records.

Failsafe-fail safe, ignore the exception when it occurs. It is commonly used for operations such as writing audit logs.

Failback-automatic recovery of failure. Failed requests are recorded in the background and resent regularly. It is commonly used for message notification operations.

Forking-calls multiple servers in parallel and returns as soon as one succeeds. It is usually used for read operations with high real-time requirements, but it needs to waste more service resources. The maximum number of parallelism can be set through forks= "2".

Broadcast-broadcasts call all providers, one by one, and any one will report an error. It is typically used to notify all providers to update local resource information such as caches or logs.

3.4. Dynamic agent strategy?

Dubbo as a RPC framework, the first thing to complete is cross-system, cross-network service invocation. The consumer and the provider follow a unified interface definition. When the consumer invokes the interface, Dubbo converts it into a unified data structure, which is transmitted through the network. The provider finds the interface according to the rules and completes the call through reflection. That is, the consumer acquires a Proxy for the remote service, while the provider needs a Wrapper to support different interface implementations. The procedure of the call looks something like this:

The consumer's Proxy and the provider's Wrapper enable Dubbo to build a complex and unified system. This kind of dynamic agent and wrapper is also realized by plug-in based on SPI, whose interface is ProxyFactory.

SPI ("javassist") public interface ProxyFactory {@ Adaptive ({Constants.PROXY_KEY}) T getProxy (Invoker invoker) throws RpcException; @ Adaptive ({Constants.PROXY_KEY}) Invoker getInvoker (T proxy, Class type, URL url) throws RpcException;}

There are two ways to implement ProxyFactory, one is based on JDK proxy implementation, the other is based on javassist implementation. @ SPI ("javassist") is defined on the ProxyFactory interface, which defaults to the implementation of javassist.

3.5. What serialization protocols does Dubbo support? Hessian? The data structure of Hessian?

Dubbo serialization, Ali's immature java serialization implementation.

Hessian2 serialization: hessian is an efficient binary serialization method across languages, but this is actually not native hessian2 serialization, but Ali modified hessian lite, which is enabled by dubbo RPC by default.

Json serialization: at present, there are two implementations, one is Ali's fastjson library, and the other is simple json library implemented in dubbo. In general, the performance of json text serialization is not as good as binary serialization.

Java serialization: mainly implemented by java serialization that comes with JDK, the performance is not ideal.

Kryo and FST:Kryo and FST still generally outperform hessian and dubbo serialization.

The difference between Hessian serialization and Java default serialization?

Hessian is a lightweight remoting on http tool that uses the Binary RPC protocol, so it is very suitable for sending binary data and has the ability to penetrate firewalls at the same time.

Hessian supports cross-language serial

Better performance and ease of use than java serialization

There are many languages supported

3.6. What is Protoco Buffer?

Protocol Buffer is a lightweight and efficient structured data storage format produced by Google. Its performance is really better than that of Json and XML! Too! More!

Serialization of Protocol Buffer & deserialization is simple & the reasons for its speed are:

The encoding / decoding method is simple (only simple mathematical operations = displacement, etc.)

Using Protocol Buffer's own framework code and compiler to complete

The reason why Protocol Buffer's data compression effect is good (that is, the serialized data volume is small) is:

Unique coding methods are adopted, such as Varint, Zigzag and so on.

Adopt T-L-V data storage mode: reduce the use of delimiters & compact data storage

3.7. Can I continue to communicate when the registry is dead?

Sure. When the application starts, the Dubbo consumer pulls the address interface of the registered producer from the registry and caches it locally. Each call is made according to the locally stored address.

3.8. What is the principle of ZooKeeper? What's the use of ZooKeeper?

ZooKeeper is a distributed application coordination system, which has been used in many distributed projects to complete unified naming service, state synchronization service, cluster management, distributed application configuration item management and so on.

Each Server stores a piece of data in memory

When Zookeeper starts, a leader (Paxos protocol) is elected from the instance.

Leader handles operations such as data updates (Zab protocol)

An update operation succeeds if and only if most Server successfully modifies data in memory.

3.9. What's the use of Netty? What's the use of NIO/BIO/AIO? What's the difference?

Netty is a "network communication framework".

The process by which Netty performs event handling. Channel is the channel of connection and the generator of ChannelEvent, while ChannelPipeline can be understood as a collection of ChannelHandler.

There are usually several ways to IO:

Synchronously blocked BIO

Synchronize non-blocking NIO

Asynchronous non-blocking AIO

If you want to process multiple client requests at the same time, or if the client wants to communicate with multiple servers at the same time, you must use multithreading to process them.

NIO is based on Reactor. When the socket has a stream to read or write to the socket, the operating system will notify the reference program to process it, and the application will read the stream to the buffer or write to the operating system. In other words, at this time, it is no longer a connection corresponding to a processing thread, but a valid request, corresponding to a thread, when the connection has no data, there is no worker thread to process.

Unlike NIO, when reading and writing, you only need to call the read or write method of API directly. Both methods are asynchronous, and for read operations, when there is a stream to read, the operating system passes the readable stream into the buffer of the read method and notifies the application; for write operations, when the operating system finishes writing the stream passed by the write method, the operating system actively notifies the application. In other words, it can be understood that the read/write methods are asynchronous, and the callback function will be called actively after completion.

3.10. Why split the system? Is it okay to split without Dubbo?

From the point of view of resources, system split is divided into application split and database split.

From the order of adoption can be divided into: horizontal expansion, vertical split, business split, horizontal split.

Whether to use the service or not depends on the actual business scenario.

When there are more and more vertical applications, the interaction between applications is inevitable, the core business will be extracted as an independent service, and gradually form a stable service center, so that the front-end applications can respond to the changing market demand more quickly. At this point, the distributed service framework (RPC) for improving business reuse and integration is the key.

When there are more and more services, the evaluation of capacity and the waste of small service resources gradually appear, so it is necessary to add a scheduling center to manage cluster capacity in real time based on access pressure to improve cluster utilization. At this point, the resource scheduling and governance center (SOA) used to improve machine utilization is the key.

3.11. What's the difference between Dubbo and Thrift?

Thrift is a cross-language RPC framework.

Dubbo supports service governance, while Thrift does not.

This is the end of the content about "what are the interview questions for Redis distributed technology". Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.