Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the high-frequency interview questions for Redis?

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what are the high-frequency interview questions about Redis. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

What is Redis?

Redis (Remote Dictionary Server) is an open source (BSD licensed) high-performance non-relational (NoSQL) key-value database written in C language.

Redis can store the mapping between keys and five different types of values. The key type can only be a string, and the value supports five data types: string, list, collection, hash table, and ordered collection.

Different from the traditional database, the data of Redis is stored in memory, so the read and write speed is very fast, so redis is widely used in the cache direction, which can handle more than 100000 read and write operations per second, which is the fastest known Key-Value DB. In addition, Redis is often used to make distributed locks. In addition, Redis supports transactions, persistence, LUA scripting, LRU-driven events, and multiple clustering scenarios.

What are the advantages and disadvantages of Redis

Advantages

The reading and writing performance of Redis is excellent. The speed of reading and writing of Redis is 110000 times / s and 81000 times / s respectively.

Data persistence is supported, and AOF and RDB persistence methods are supported.

Transactions are supported, and all operations of Redis are atomic, while Redis also supports atomic execution of several merged operations.

There are rich data structures, such as hash, set, zset, list and so on, in addition to value of string type.

Master-slave replication is supported, and the host automatically synchronizes the data to the slave, allowing read-write separation.

Shortcoming

Database capacity is limited by physical memory and cannot be used for high-performance reading and writing of massive data, so the scenarios suitable for Redis are mainly limited to high-performance operations and operations with a small amount of data.

Redis does not have automatic fault tolerance and recovery features, and the downtime of the host slave will cause the read and write requests of the frontend to fail. You need to wait for the machine to restart or manually switch the IP of the frontend to recover.

The host is down, some of the data can not be synchronized to the slave in time before the downtime, and the problem of data inconsistency will be introduced after switching IP, which reduces the availability of the system.

It is difficult for Redis to support online expansion, and it will become very complicated when the cluster capacity reaches the upper limit. In order to avoid this problem, operators must ensure that there is enough space when the system is online, which causes a great waste of resources.

Why use cache? why use Redis?

This problem is mainly viewed from two aspects: "high performance" and "high concurrency".

High performance:

If the user accesses some data in the database for the first time. This process will be slow because it is read from the hard drive. Store the data accessed by the user in the data cache so that the next time you access the data, you can get it directly from the cache. Operation caching is the direct manipulation of memory, so the speed is quite fast. If the corresponding data in the database changes, the corresponding data in the cache can be changed synchronously!

High concurrency:

Direct operation of the cache can withstand far more requests than direct access to the database, so we can consider transferring some of the data in the database to the cache, so that part of the user's requests will go directly to the cache without going through the database.

Why use Redis instead of map/guava for caching?

Caching is divided into local cache and distributed cache. Take Java as an example. Local cache is implemented using built-in map or guava. The main features are lightweight and fast. The life cycle ends with the destruction of jvm. In the case of multiple instances, each instance needs to keep a cache, and the cache is not consistent.

Using a distributed cache such as redis or memcached, in the case of multiple instances, each instance shares a share of the cached data, and the cache is consistent. The disadvantage is the need to maintain the high availability of redis or memcached services, and the overall program architecture is more complex.

Why is Redis so fast?

1. Based entirely on memory, most of the requests are purely memory operations, which is very fast. The data is stored in memory, and the advantage similar to HashMap,HashMap is that the time complexity of search and operation is O (1).

2. The data structure is simple and the data operation is simple. The data structure in Redis is specially designed.

3. Single thread avoids unnecessary context switching and competition conditions, and there is no CPU consumption caused by multi-process or multi-thread switching, there is no need to consider all kinds of locks, there is no lock release operation, and there is no performance consumption caused by possible deadlocks.

4. Using the multi-channel Istroke O multiplexing model, non-blocking IO

5. Different underlying models are used, and the underlying implementation between them and the application protocols for communication with clients are different. Redis directly builds its own VM mechanism, because the general system calls system functions, it will waste a certain amount of time to move and request.

What are the data types of Redis

Redis has five main data types, including String,List,Set,Zset,Hash, which meets most of the requirements.

Types

Stored value

Operation

Application scenario

STRING

A string, integer, or floating point number performs an operation on the entire string or part of a string

Perform self-increment or self-subtraction operations on integers and floating-point numbers to do a simple key-value pair cache LIST

List presses or pops elements from both ends

Pruning single or multiple elements, leaving only one range of elements to store some column phenotypic data structures, such as fan lists, article comment lists, and so on, SET

Unordered set

Add, get, remove individual elements

Check whether an element exists in the collection

Calculate intersection, union, difference

The operation of randomly getting the intersection, union, and difference sets of elements from the collection, such as intersection, can HASH the whole list of fans of two people.

Hash

Add, get, and remove a single key-value pair

Get all key-value pairs

Check whether a key has structured data, such as an ordered collection of objects ZSET

Add, get, delete elements

Get the element based on the score range or member

Calculate the ranking of a key, but it can be sorted, such as the application scenario of getting the top users Redis.

Summary one

1. Counter

The counter function can be realized by self-increment and self-subtraction of String. Redis, an in-memory database, has very high read and write performance, so it is suitable for storing the count of frequent reads and writes.

2. Caching

Put the hot spot data in memory, set the maximum memory usage and elimination strategy to ensure the cache hit rate.

3. Session caching

You can use Redis to uniformly store session information for multiple application servers. When the application server no longer stores the user's session information and no longer has state, a user can request any application server, which makes it easier to achieve high availability and scalability.

4. Full-page cache (FPC)

In addition to the basic session token, Redis provides a very simple FPC platform. In the case of Magento, Magento provides a plug-in to use Redis as the full-page cache backend. In addition, for WordPress users, Pantheon has a very good plug-in wp-redis, which can help you load the pages you have visited as quickly as possible.

5. Look up the table

For example, DNS records are suitable for storage using Redis. Lookup tables, like caches, take advantage of Redis's fast lookup features. However, the contents of the lookup table cannot be invalidated, and the contents of the cache can be invalidated because the cache is not a reliable source of data.

6. Message queuing (publish / subscribe function)

List is a bi-directional linked list that can write and read messages through lpush and rpop. However, it is best to use message middleware such as Kafka and RabbitMQ.

7. Implementation of distributed lock

In a distributed scenario, locks in a stand-alone environment cannot be used to synchronize processes on multiple nodes. You can use the SETNX command that comes with Redis to implement distributed locks, but you can also use the official RedLock distributed locking implementation.

8. Other

Set can achieve intersection, union and other operations, so as to achieve common friends and other functions. ZSet can achieve orderly operation, so as to achieve ranking and other functions.

Summary two

One of the great advantages of Redis over other caches is that it supports multiple data types.

The data type describes the string string. The simplest KMurv stores hashhash format, and the value is field and value, which is suitable for scenarios such as ID-Detail. List simple list, sequential list, support the first or last insertion of data set unordered list, fast search speed, suitable for intersection, union, difference processing sorted set ordered set

In fact, through the characteristics of the above data types, you can basically think of the appropriate application scenarios.

String-- is suitable for the simplest KMTV storage, similar to memcached storage structure, SMS CAPTCHA, configuration information, etc., using this type to store.

Generally speaking, hash-- key is ID or only marked, and value corresponds to the details. Such as commodity details, personal information details, news details and so on.

Because list is ordered, list-- is more suitable for storing some ordered and relatively fixed data. Such as provincial, municipal and regional tables, dictionaries and so on. Because list is orderly, it is suitable to sort according to the time of writing, such as: the latest hot news, message queue and so on.

Set-- can be simply understood as the mode of ID-List, such as which friends one person has in Weibo. The best thing about set is that it can provide intersection, union and difference operation to the two set. For example: find two people's common friends and so on.

Sorted Set-- is an enhanced version of set, with the addition of a score parameter that automatically sorts according to the value of score. It is more suitable for data such as top 10 that is not sorted according to the time of insertion.

As mentioned above, although Redis does not have as complex data structures as relational databases, it can also be suitable for many scenarios, more than normal cached data structures. Understanding the appropriate business scenarios for each data structure can not only improve development efficiency, but also effectively take advantage of the performance of Redis.

What is Redis persistence?

Persistence is to write the data in memory to disk to prevent the loss of memory data due to service downtime.

What is the persistence mechanism of Redis? Their respective advantages and disadvantages?

Redis provides two persistence mechanisms: RDB (default) and AOF

RDB: is a Redis DataBase abbreviated snapshot

RDB is the default persistence method for Redis. According to a certain time, the data in memory is saved to the hard disk in the form of a snapshot, and the corresponding data file is dump.rdb. The period of the snapshot is defined by the save parameter in the configuration file.

Advantages:

1. There is only one file dump.rdb, which is convenient for persistence.

2. Good disaster tolerance, a file can be saved to a secure disk.

3, performance maximization, fork child process to complete the write operation, let the main process continue to process commands, so it is IO maximization. Using a single child process for persistence, the main process will not do any IO operation, ensuring the high performance of redis.

4. The startup efficiency is higher than that of AOF when the dataset is large.

Disadvantages:

1. Data security is low. RDB is persisted at intervals. If redis fails during persistence, data loss will occur. So this method is more suitable when the data requirements are not stringent)

2. AOF (Append-only file) persistence mode: means that all command line records are completely persisted in the format of redis command request protocol) saved as aof files.

AOF: persistence

AOF persistence (that is, Append Only File persistence) records each write command executed by Redis in a separate log file. When Redis is restarted, the persistent log files will be restored.

When both methods are enabled at the same time, Redis for data recovery will give priority to AOF recovery.

Advantages:

1. Data security. Aof persistence can be configured with appendfsync attribute. With always, every command operation is recorded in the aof file.

2. Write files through append mode, even if the server is down midway, you can solve the problem of data consistency through redis-check-aof tools.

3. Rewrite mode of AOF mechanism. Before the AOF file is rewrite (commands will be merged and rewritten when the file is too large), you can delete some of the commands (such as misoperated flushall).

Disadvantages:

1. AOF files are larger than RDB files, and the recovery speed is slow.

2. When the dataset is large, the startup efficiency is lower than that of rdb.

What are the advantages and disadvantages?

AOF files are updated more frequently than RDB, and AOF is preferred to restore data.

AOF is safer and bigger than RDB.

RDB has better performance than AOF.

If both are equipped with priority loading AOF

How to choose the appropriate persistence method?

In general, if you want to achieve data security comparable to PostgreSQL, you should use both persistence features. In this case, when Redis restarts, it is preferred to load the AOF file to recover the original data, because in general, the dataset saved by the AOF file is more complete than the dataset saved by the RDB file.

If you are very concerned about your data, but can still withstand data loss within a few minutes, then you can only use RDB persistence.

There are many users who only use AOF persistence, but this method is not recommended, because regular RDB snapshots (snapshot) are very convenient for database backup, and RDB recovers datasets faster than AOF recovery. in addition, using RDB can also avoid the bug of AOF programs.

If you only want your data to exist while the server is running, you don't have to use any persistence.

How to expand the persistent data and cache of Redis?

If Redis is used as a cache, use consistent hash to achieve dynamic expansion and reduction.

If Redis is used as a persistent store, a fixed keys-to-nodes mapping relationship must be used, and the number of nodes cannot be changed once determined. Otherwise (that is, where Redis nodes need to change dynamically), you must use a system that can rebalance data at run time, which only Redis clusters can currently do.

Delete policy for expired keys of Redis

As we all know, Redis is a key-value database, and we can set the expiration time of the key cached in Redis. The expiration policy of Redis refers to what Redis does when the cached key in Redis expires.

There are usually three expiration policies:

Timed expiration: each key that sets the expiration time needs to create a timer that will be cleared immediately when it expires. This strategy can clean up expired data immediately and is memory-friendly, but it takes up a lot of CPU resources to process expired data, thus affecting the response time and throughput of the cache.

Lazy expiration: it is only when a key is accessed that it is determined whether the key has expired or is cleared when it expires. This strategy maximizes the savings in CPU resources, but is very memory-friendly. In extreme cases, there may be a large number of expired key that are not accessed again, so that they are not cleared and take up a lot of memory.

Periodic expiration: at regular intervals, a certain number of expires dictionaries in a certain number of databases are scanned for a certain number of key, and the expired key is cleared. This strategy is a compromise between the first two. By adjusting the time interval of timing scanning and the limited time of each scan, the optimal balance between CPU and memory resources can be achieved under different conditions.

(the expires dictionary holds expiration time data for all key with an expiration time set, where key is a pointer to a key in the key space, and value is the expiration time represented by that key's millisecond precision UNIX timestamp. Key space refers to all keys saved in the Redis cluster.)

Both lazy expiration and periodic expiration are used in Redis.

How to set the expiration time and permanent validity of Redis key?

EXPIRE and PERSIST commands.

We know that the expiration time of key is set through expire, so what do we do with the expired data?

In addition to the cache invalidation policy provided by the cache server (Redis has 6 policies to choose from by default), we can also customize cache elimination according to specific business needs. There are two common strategies:

1. Clean the expired cache regularly.

2. When a user requests, determine whether the cache used in the request has expired. If it expires, go to the underlying system to get new data and update the cache.

Both have their own advantages and disadvantages. The first disadvantage is that it is troublesome to maintain a large number of cached key, and the second disadvantage is to judge the cache invalidation every time the user requests it. The logic is relatively complex! You can weigh which scheme to use according to your own application scenario.

There are 2000w data in MySQL and only 20w data in redis. How to ensure that the data in redis are all hot data?

When the redis in-memory dataset size rises to a certain size, the data elimination strategy is implemented.

What are the memory elimination strategies of Redis?

Redis's memory elimination strategy refers to how to deal with data that requires new writes and requires additional space when Redis runs out of memory for caching.

1. Selective removal of global bond space

Noeviction: when there is not enough memory to hold new write data, the new write operation reports an error.

Allkeys-lru: when there is not enough memory to hold newly written data, remove the least recently used key in the key space. (this is the most commonly used)

Allkeys-random: when there is not enough memory to hold newly written data, a key is randomly removed in the key space.

2. Selective removal of key space for setting expiration time

Volatile-lru: when there is not enough memory to hold newly written data, remove the least recently used key from the key space where the expiration time is set.

Volatile-random: when there is not enough memory to hold newly written data, a key is randomly removed from the key space where the expiration time is set.

Volatile-ttl: when there is not enough memory to hold newly written data, key with an earlier expiration time is removed first in the key space where the expiration time is set.

Summary

The selection of memory elimination strategy for Redis does not affect the processing of expired key. The memory obsolescence policy is used to deal with data that requires additional space when there is insufficient memory, and the expiration policy is used to deal with expired cached data.

What are the main physical resources consumed by Redis?

Memory.

What happens when Redis runs out of memory?

If the upper limit is reached, Redis's write command returns an error message (but the read command returns normally. Or you can configure the memory elimination mechanism so that the old content will be washed out when the Redis reaches the memory upper limit.

How does Redis optimize memory?

You can make good use of collection type data such as Hash,list,sorted set,set, because usually many small Key-Value can be stored together in a more compact way. Use a hash table (hashes) as much as possible. A hash table (that is, a small number stored in a hash table) uses very little memory, so you should abstract your data model into a hash table as much as possible. For example, if you have a user object in your web system, don't set a separate key for the user's name, last name, mailbox, and password. Instead, store all the user's information in a hash table.

Redis thread model

Redis developed a network event handler based on the Reactor pattern, which is called a file event handler (file event handler). It consists of four parts: multiple sockets, IO multiplexer, file event dispatcher and event handler. Redis is called a single-threaded model because the consumption of the file event dispatcher queue is single-threaded.

1. The file event handler uses the Istroke O Multiplexing (multiplexing) program to listen to multiple sockets at the same time and correlate different event handlers for the socket according to the tasks currently performed by the socket.

2. When the monitored socket is ready to perform operations such as connection reply (accept), read (read), write (write), close, and so on, the file event corresponding to the operation will be generated, and the file event handler will call the event handler associated before the socket to handle these events.

Although the file event processor runs in a single-thread mode, by using the Imax O multiplexer to monitor multiple sockets, the file event processor not only implements a high-performance network communication model, but also interfaces well with other modules in the redis server that also run in a single-thread mode, which keeps the simplicity of the single-thread design within Redis.

What is a transaction?

A transaction is a separate isolation operation: all commands in the transaction are serialized and executed sequentially. In the course of execution, the transaction will not be interrupted by command requests sent by other clients.

A transaction is an atomic operation: either all or none of the commands in the transaction are executed.

The concept of Redis transaction

The essence of a Redis transaction is a collection of commands such as MULTI, EXEC, WATCH, and so on. Transactions support the execution of multiple commands at a time, and all commands in a transaction are serialized. During transaction execution, commands in the queue are serialized sequentially, and command requests submitted by other clients are not inserted into the transaction execution command sequence.

To sum up: a redis transaction is an one-time, sequential, exclusive execution of a series of commands in a queue.

The three phases of a Redis transaction

1. Start the transaction MULTI

2. Order to join the team

3. Transaction execution EXEC

During the execution of a transaction, if the server receives a request other than EXEC, DISCARD, WATCH or MULTI, it will queue the request in a queue.

Redis transaction related commands

Redis transaction function is realized by four primitives: MULTI, EXEC, DISCARD and WATCH.

Redis serializes all commands in a transaction and then executes them sequentially.

1. Redis does not support rollback. "Redis does not roll back when the transaction fails, but continues to execute the remaining commands", so the interior of Redis can be kept simple and fast.

2. If an error occurs in a command in a transaction, then all commands will not be executed

3. If a run error occurs in a transaction, the correct command will be executed.

The WATCH command is an optimistic lock that provides check-and-set (CAS) behavior for Redis transactions. One or more keys can be monitored, and once one of the keys is modified (or deleted), subsequent transactions are not executed and monitoring continues until the EXEC command.

The MULTI command is used to open a transaction, which always returns OK. After the MULTI is executed, the client can continue to send as many commands as possible to the server. These commands are not executed immediately, but are placed in a queue. When the EXEC command is called, the commands in all queues are executed.

EXEC: executes commands within all transaction blocks. Returns the return values of all commands within the transaction block, in the order in which the commands were executed. Returns a null value of nil when the operation is interrupted.

By calling DISCARD, the client can empty the transaction queue and abandon the execution of the transaction, and the client will exit from the transaction state.

The UNWATCH command cancels watch's monitoring of all key.

Overview of transaction Management (ACID)

Atomicity (Atomicity)

Atomicity means that a transaction is an indivisible unit of work, and either all or none of the operations in the transaction occur.

Consistency (Consistency)

The integrity of data must be consistent before and after the transaction.

Isolation (Isolation)

When multiple transactions are executed concurrently, the execution of one transaction should not affect the execution of other transactions.

Persistence (Durability)

Persistence means that once a transaction is committed, its change to the data in the database is permanent, and then even if the database fails, it should not have any impact on it.

Redis transactions always have the same consistency and isolation as in ACID, and other features are not supported. Transactions also have durability when the server is running in AOF persistence mode and the value of the appendfsync option is always.

Does Redis transaction support isolation?

Redis is a single-process program, and it ensures that when a transaction is executed, the transaction will not be interrupted, and the transaction can run until all commands in the transaction queue have been executed. Therefore, Redis transactions are always isolated.

Does Redis transaction guarantee atomicity? does it support rollback?

In Redis, a single command is executed atomically, but the transaction is not guaranteed atomicity and there is no rollback. The execution of any command in the transaction fails, and the remaining commands will still be executed.

Other implementations of Redis transaction

Based on the Lua script, Redis can ensure that the commands in the script are executed one time and sequentially, and it does not provide the rollback of transaction running errors. if some of the commands run incorrectly during execution, the remaining commands will continue to run.

Based on the intermediate marker variable, another marker variable is used to identify whether the transaction execution is completed. When reading the data, the tag variable is read first to determine whether the transaction execution is completed. But this will require additional code implementation, which is more tedious.

Sentinel mode

The introduction of the Sentinel

Sentinel, the Chinese name is Sentinel. The Sentinel is a very important component of the redis cluster organization, which has the following functions:

Cluster monitoring: responsible for monitoring whether redis master and slave processes are working properly.

Message notification: if a redis instance fails, the sentry is responsible for sending a message to the administrator as an alarm notification.

Failover: if the master node dies, it will be automatically transferred to the slave node.

Configuration Center: if a failover occurs, notify the client client of the new master address.

Sentinels are used to achieve the high availability of redis clusters and are themselves distributed, running as a Sentinel cluster and working together.

1. During failover, it requires the consent of most sentinels to determine whether a master node is down, which involves the issue of distributed election.

2. Even if some of the sentinel nodes are down, the Sentinel cluster can still work properly, because it would be awkward if the failover system itself, which is an important part of the high availability mechanism, is a single point.

The core knowledge of Sentinel

1. The Sentinel needs at least three examples to ensure his robustness.

2. The deployment architecture of Sentinel + redis master-slave does not guarantee zero data loss, but can only ensure the high availability of redis clusters.

3. For the complex deployment architecture of Sentinel + redis master-slave, try to conduct adequate tests and drills in both the test environment and the production environment.

Official Redis Cluster solution (server-side routing query)

Can you tell me how the redis cluster mode works? How is the key of redis addressed in cluster mode? What are the algorithms for distributed addressing? Do you know the consistent hash algorithm?

Brief introduction

Redis Cluster is a server-side Sharding technology, version 3.0 has been officially available. Instead of using consistent hash, Redis Cluster uses the concept of slot (slots), which is divided into 16384 slots. Send the request to any node, and the node that receives the request will send the query request to the correct node for execution.

Scheme description

1. By means of hashing, the data is divided into pieces, and each node is equally divided to store data in a certain hash slot (hash value) interval. 16384 slots are allocated by default.

2. Each data fragment will be stored on multiple master-slave nodes.

3. Data is written to the master node first, and then synchronized to the slave node (support is configured to block synchronization)

4. The data of multiple nodes in the same shard are not consistent.

5. When reading data, when the key operated by the client is not assigned to the node, the redis will return the redirection instruction and point to the correct node

6. When expanding capacity, it is necessary to migrate part of the data from the old node to the new node.

Under the redis cluster architecture, each redis needs to release two port numbers, for example, one is 6379, and the other is to add a 1w port number, such as 16379.

16379 port number is used for communication between nodes, that is, cluster bus things, cluster bus communication, used for fault detection, configuration updates, failover authorization. Cluster bus uses another binary protocol, gossip protocol, for efficient data exchange between nodes, taking up less network bandwidth and processing time.

Internal communication mechanism between nodes

Basic communication principle

There are two ways to maintain cluster metadata: centralized and Gossip protocol. Redis cluster nodes communicate with each other using gossip protocol.

Distributed addressing algorithm

Hash algorithm (massive cache reconstruction)

Consistent hash algorithm (automatic cache migration) + virtual node (automatic load balancing)

Hash slot algorithm of redis cluster

Advantages

1. Non-central architecture, supporting dynamic expansion and being transparent to business

2. Have the ability of Sentinel monitoring and automatic Failover (failover)

3. The client does not need to connect all the nodes in the cluster, but can connect to any available node in the cluster.

4. High performance, the client is directly connected to the redis service, eliminating the wear and tear of the proxy agent.

Shortcoming

1. Operation and maintenance is also very complex, and data migration requires human intervention.

2. Only 0 database can be used.

3. Batch operation (pipeline pipeline operation) is not supported.

4. Distributed logic and storage module coupling, etc.

Client-based allocation

Brief introduction

Redis Sharding is a multi-Redis instance clustering method widely used in the industry before Redis Cluster came out. The main idea is to use the hash algorithm to hash the key of Redis data. Through the hash function, a specific key will be mapped to a specific Redis node. Java redis client drives jedis and supports Redis Sharding functions, namely ShardedJedis and ShardedJedisPool combined with cache pool

Advantages

The advantage is that it is very simple. The Redis instances on the server side are independent and unrelated to each other. Each Redis instance runs like a single server, which is very easy to expand linearly, and the system is very flexible.

Shortcoming

1. Because the sharding processing is placed on the client, it will bring challenges to OPS when the scale is further expanded.

2. Client sharding does not support dynamic addition and deletion of nodes. When the topology of the server Redis instance group changes, each client needs to update and adjust. The connection cannot be shared, and when the application scale increases, the waste of resources restricts optimization.

Based on proxy server slicing

Brief introduction

The client sends a request to an agent component, which parses the client's data, forwards the request to the correct node, and finally returns the result to the client.

Features

1. Transparent access, business programs do not need to care about back-end Redis instances, and low switching cost

2. The logic of Proxy and the logic of storage are isolated.

3. The proxy layer has one more forwarding, resulting in a loss of performance.

Industry open source solution

1. Twtter open source Twemproxy

2. Pea pod open source Codis

Redis master-slave architecture

The stand-alone redis can carry tens of thousands to tens of thousands of QPS. For caching, it is generally used to support high read concurrency. Therefore, the architecture is made into a master-slave (master-slave) architecture, one master and multiple slaves, the master is responsible for writing, and the data is copied to other slave nodes, and the slave node is responsible for reading. All read requests go to the slave node. In this way, it is also easy to achieve horizontal expansion to support high read concurrency.

Redis replication-> Master-Slave Architecture-> read-write Separation-> horizontal expansion supports High read concurrency

The core mechanism of redis replication

1. Redis replicates data to slave nodes asynchronously, but starting with redis2.8, slave node periodically confirms the amount of data it replicates.

2. One master node can be configured with multiple slave node.

3. Slave node can also connect to other slave node

4. When slave node makes a copy, block master node will not work properly.

5. When slave node does replication, it will not block its own query operation, it will use the old dataset to provide services, but when the replication is completed, it needs to delete the old dataset and load the new dataset, and the external service will be suspended.

6. Slave node is mainly used for horizontal expansion and separation of read and write. The expanded slave node can improve the read throughput.

Note that if the master-slave architecture is adopted, it is recommended to enable the persistence of master node, and it is not recommended to use slave node as the data hot backup of master node, because in that case, if you turn off the persistence of master, the data may be empty when master is down and restarted, and then the data of slave node may be lost as soon as it is copied.

In addition, various backup schemes of master also need to be done. In case all local files are lost, select a rdb from the backup to restore master, so as to ensure that there is data when starting. Even if the high availability mechanism explained later, slave node can automatically take over master node, but sentinel may restart automatically without detecting master failure,master node, or it may cause all the above slave node data to be emptied.

The core principle of redis master-slave replication

When you start a slave node, it sends a PSYNC command to master node.

If this is the first time slave node connects to master node, a full full resynchronization copy will be triggered. At this point, master starts a background thread and starts to generate a RDB snapshot file.

At the same time, all new write commands received from the client client are cached in memory. After the RDB file is generated, master sends the RDB to slave,slave, which is written to the local disk and then loaded into memory from the local disk.

Master then sends in-memory cached write commands to slave,slave and synchronizes the data.

If slave node has a network failure with master node and is disconnected, it will automatically reconnect. After the connection, master node will only copy the missing data to slave.

Process principle

1. When a MS relationship is established between the slave database and the master database, a SYNC command will be sent to the master database

2. After receiving the SYNC command, the main library will start to save the snapshot in the background (RDB persistence process) and cache the write commands received during the period.

3. When the snapshot is completed, the master Redis sends the snapshot file and all cached write commands to the slave Redis

4. After receiving from Redis, the snapshot file will be loaded and the cache command will be executed.

5. After that, the master Redis will send the command to the slave Redis whenever it receives a write command, thus ensuring the consistency of the data

Shortcoming

All the slave node data replication and synchronization are handled by the master node, which will cause too much pressure on the master node and use the master-slave structure to solve the problem.

What is the master-slave replication model of Redis clusters?

In order to make the cluster still available when some nodes fail or most nodes cannot communicate, the cluster uses the master-slave replication model, and each node will have a replica of NMel.

How is redis deployed in a production environment?

Redis cluster,10 machines, 5 machines deploy redis master instances, and other 5 machines deploy redis slave instances. One slave instance is attached to each master instance, and 5 nodes provide read and write services. The peak qps of each node may reach 50,000 per second, and the maximum number of read and write requests per 5 machines is 250000 read and write requests / s.

What is the configuration of the machine? 32G memory + 8-core CPU + 1T disk, but the memory allocated to the redis process is 10g. In general online production environments, redis memory should not exceed 10g, which may be problematic.

Five machines provide external reading and writing, with a total of 50g of memory.

Because each master instance is hung with a slave instance, it is highly available. If any master instance goes down, it will automatically fail over, and the redis slave instance will automatically become the master instance and continue to provide read and write services.

What kind of data are you writing into memory? What is the size of each piece of data? Commodity data, each piece of data is 10kb. 100 pieces of data is 1mbBJ 100,000 pieces of data is 1g. The resident memory is 2 million pieces of commodity data, which occupies 20g of memory, which is less than 50% of the total memory. The current peak is about 3500 requests per second.

In fact, for large companies, there will be infrastructure team responsible for caching cluster operation and maintenance.

Tell me about the concept of Redis hash slot?

Redis cluster does not use consistent hash, but introduces the concept of hash slot. Redis cluster has 16384 hash slots. Each key uses a module of 16384 after CRC16 verification to decide which slot to place, and each node of the cluster is responsible for part of the hash slot.

Will any write operations be lost in the Redis cluster? Why?

Redis does not guarantee strong consistency of data, which means that in practice, clusters may lose writes under certain conditions.

How is replication between Redis clusters?

Asynchronous replication

What is the maximum number of nodes in a Redis cluster?

16384

How does a Redis cluster select a database?

Currently, Redis cluster cannot make database selection. Default is 0 database.

Redis is single-threaded, how to improve the utilization of multicore CPU?

You can deploy multiple Redis instances on the same server and use them as different servers. At some point, one server is not enough anyway, so if you want to use multiple CPU, you can consider shard.

Why do you want to do Redis partition?

Partitioning will allow Redis to manage more memory, and Redis will be able to use the memory of all machines. If there is no partition, you can only use the memory of one machine at most. Partition increases the computing power of Redis by simply increasing the number of computers, and the network bandwidth of Redis increases exponentially with the increase of computers and network cards.

Do you know what Redis partitioning schemes are available?

1. Client partition means that the client has decided which redis node the data will be stored in or which redis node to read from. Most clients have implemented client partitioning.

2. Agent partitioning means that the client sends the request to the agent, and then the agent decides which node to write or read the data. The agent decides which Redis instances to request according to the partition rules, and then returns it to the client based on the response result of the Redis. One proxy implementation of redis and memcached is Twemproxy.

3. Query routing (Query routing) means that the client randomly requests any redis instance, and then the Redis forwards the request to the correct Redis node. Redis Cluster implements a hybrid form of query routing, but instead of forwarding requests directly from one redis node to another redis node, it directly redirected to the correct redis node with the help of the client.

What are the disadvantages of Redis partitioning?

1. Operations involving multiple key are usually not supported. For example, you can't intersect two sets because they may be stored in different Redis instances (in fact, there is a way, but you can't use the intersection instruction directly).

2. If you operate multiple key at the same time, you cannot use Redis transactions.

3. The granularity used by partitions is key, so you cannot use a very long sorted key to store a dataset (The partitioning granularity is the key, so it is not possible to shard a dataset with a single huge key like a very big sorted set).

4. When using partitions, data processing can be very complex, for example, in order to back up, you have to collect RDB / AOF files from different Redis instances and hosts at the same time.

5. Dynamic capacity expansion or reduction during partition may be very complex. Redis cluster can rebalance data transparently to users by adding or deleting Redis nodes at run time, but some other client or agent partitioning methods do not support this feature. However, there is a pre-slicing technology that can also solve this problem.

Implementation of distributed Lock with Redis

Redis is a single-process and single-thread mode, which uses queue mode to turn concurrent access into serial access, and there is no competition between multiple clients to connect to Redis. Distributed locks can be realized by using SETNX commands in Redis.

If and only if key does not exist, set the value of key to value. If a given key already exists, the SETNX does not take any action

SETNX is an abbreviation for "SET if Not eXists" (or SET if it does not exist).

Return value: 1 is returned if the setting is successful. Setting failed and 0 was returned.

The process and items of using SETNX to complete the synchronization lock are as follows:

Use the SETNX command to acquire the lock. If 0 is returned (key already exists, the lock already exists), the acquisition fails, otherwise it succeeds.

In order to prevent the exception of the program after acquiring the lock, causing other threads / processes to call the SETNX command always return 0 and enter the deadlock state, it is necessary to set a "reasonable" expiration time for the key.

Release the lock and use the DEL command to delete the lock data

How to solve the problem of concurrent competitive Key in Redis

The so-called concurrent competitive Key problem of Redis means that multiple systems operate on a key at the same time, but the order of execution is different from that we expected, which leads to different results!

Recommend a solution: distributed locking (both zookeeper and redis can implement distributed locking). (if there is no concurrency competition for Key for Redis, do not use distributed locks, which can affect performance.)

Distributed locking based on zookeeper temporary ordered nodes. The general idea is that when each client locks a method, a unique instantaneous ordered node is generated under the directory of the specified node corresponding to the method on the zookeeper. The way to determine whether or not to acquire a lock is simple, just to judge the one with the lowest sequence number in the ordered node. When the lock is released, you only need to delete the instantaneous node. At the same time, it can avoid the deadlock problem caused by the unreleased lock caused by service downtime. After completing the business process, delete the corresponding child node to release the lock.

In practice, of course, it is based on reliability. So Zookeeper is the first choice.

Is it better to do distributed Redis in the early stage or on a later scale? Why?

Since Redis is so lightweight (only 1 MB of memory is used for a single instance), the best way to prevent future expansion is to start more instances in the first place. Even if you have only one server, you can start with Redis running in a distributed manner, using partitions, and launching multiple instances on the same server.

Setting up a few more Redis instances in the first place, such as 32 or 64, may be cumbersome for most users, but it's worth the sacrifice in the long run.

That way, as your data grows and you need more Redis servers, all you need to do is simply move Redis instances from one service to another (without thinking about repartitioning). Once you have added another server, you need to move half of your Redis instances from the first machine to the second machine.

What is RedLock?

The official Redis station has proposed an authoritative way to implement distributed locks based on Redis called Redlock, which is more secure than the previous single-node method. It guarantees the following features:

1. Security features: mutually exclusive access, that is, only one client can always get the lock

2. Avoid deadlocks: in the end, the client may get the lock, and there will be no deadlock, even if the client crash that originally locked a resource or the network partition appears

3. Fault tolerance: as long as most Redis nodes survive, they can provide services normally.

Cache exception

Cache avalanche

Cache avalanche refers to a large area of cache failure at the same time, so the subsequent requests will fall on the database, causing the database to bear a large number of requests in a short period of time and collapse.

Solution

1. The expiration time of cached data is set randomly to prevent a large number of data expiration at the same time.

2. When the general concurrency is not very large, the most frequently used solution is locking queuing.

3. Add a corresponding cache mark to each cached data to record whether the cache is invalid or not, and update the data cache if the cache tag fails.

Cache penetration

Cache penetration refers to the data that is not available in the cache and database, causing all requests to fall on the database, causing the database to bear a large number of requests in a short period of time and collapse.

Solution

1. Add verification in the interface layer, such as user authentication check, id to do basic verification, id1) k (k > 1) independent hash functions to ensure that the process of judging the weight of elements is completed under a given space and misjudgment rate.

Its advantage is that the space efficiency and query time are far higher than the general algorithm, and the disadvantage is that it has a certain error recognition rate and deletion difficulties.

The core idea of Bloom-Filter algorithm is to use several different Hash functions to resolve "conflicts".

There is a conflict (collision) problem in Hash, and the values of two URL obtained from the same Hash may be the same. To reduce conflicts, we can introduce a few more Hash, and if we conclude from one of the Hash values that an element is not in the collection, then the element is definitely not in the collection. Only when all the Hash functions tell us that the element is in the collection can we be sure that the element exists in the collection. This is the basic idea of Bloom-Filter.

Bloom-Filter is generally used to determine the existence of an element in a collection of large amounts of data.

Cache breakdown

Cache breakdown means that there is no data in the cache but some data in the database (usually the cache time expires). At this time, because there are so many concurrent users, the read cache does not read the data, and at the same time, it goes to the database to fetch the data, which causes the pressure on the database to increase instantly and cause too much pressure. Unlike the cache avalanche, the cache breakdown refers to and check the same data, the cache avalanche is that different data are out of date, a lot of data can not be found in order to check the database.

Solution

1. Set hotspot data to never expire

2. Add mutex lock

Cache warm-up

Cache preheating means that after the system is online, the relevant cache data is loaded directly into the cache system. In this way, you can avoid the problem of querying the database and then caching the data when the user requests it. Users directly query pre-warmed cache data!

Solution

1. Write a cache refresh page directly, and do it manually when you launch.

2. The amount of data is small and can be loaded automatically when the project starts.

3. Refresh the cache regularly

Cache degradation

When there is a sharp increase in traffic, when there is a problem with the service (such as slow response time or non-response), or when non-core services affect the performance of the core process, it is still necessary to ensure that the service is still available, even if it is damaging. The system can be degraded automatically according to some key data, or the switch can be configured to achieve manual degradation.

The ultimate goal of cache degradation is to ensure that core services are available, even if they are lossy. And some services cannot be downgraded (such as adding shopping carts, clearing).

Before downgrading, the system should be combed to see if the system can lose pawn protection, so as to sort out which must be vowed to protect and which can be degraded. For example, you can refer to the log level to set a plan:

1. General: for example, some services can be degraded automatically if they time out occasionally because of network jitter or when the service is online.

2. Warning: if the success rate of some services fluctuates over a period of time (for example, between 95% and 100%), you can downgrade automatically or manually, and send an alarm.

3. Error: for example, if the availability rate is less than 90%, or the database connection pool is knocked out, or the number of visits suddenly soars to the maximum threshold that the system can withstand, you can automatically or manually downgrade according to the situation.

4. Serious error: for example, if the data is wrong due to special reasons, an emergency manual downgrade is required.

The purpose of service degradation is to prevent Redis service failures, resulting in avalanche problems in the database. Therefore, a service degradation strategy can be adopted for unimportant cached data. For example, a more common practice is to return the default value directly to the user instead of querying the database when there is a problem with Redis.

Hot and cold data

Caching is valuable for hot data.

For cold data, most of the data may be squeezed out of memory before it is accessed again, which not only takes up memory, but also is of little value. Frequently modified data, depending on the situation, consider using caching

For hot data, such as one of our IM products, birthday greeting module, birthday list of the day, cache may read hundreds of thousands of times later. For another example, for a navigation product, we will navigate the information and cache it for millions of times later.

Caching is meaningful only if the data is read at least twice before the data is updated. This is the most basic strategy, and if the cache fails before it works, it won't be of much value.

Does it exist that the modification frequency is high, but the cached scenarios have to be considered? Yes! For example, this read interface puts a lot of pressure on the database, but it is also hot data. At this time, we need to consider caching means to reduce the pressure on the database. For example, the number of likes, favorites and shares of one of our assistant products is a very typical hot data, but it is constantly changing. At this time, we need to synchronize the data to the Redis cache to reduce the pressure on the database.

Cache hotspot key

When a Key in the cache (such as a promotional product) expires at a certain point in time, there are a large number of concurrent requests for this Key. These requests find that the cache expiration will generally load data from the back-end DB and set it back to the cache. At this time, large concurrent requests may crush the back-end DB instantly.

Solution

Lock the cache query. If the KEY does not exist, add the lock, then check DB into the cache, and then unlock it. Other processes wait if they find a lock, and then return data or enter the DB query after unlocking.

Common tools

What are the Java clients supported by Redis? Which one is officially recommended?

Redisson, Jedis, lettuce, etc., Redisson is officially recommended.

What is the relationship between Redis and Redisson?

Redisson is an advanced distributed coordination Redis client, which can help users easily implement some Java objects (Bloom filter, BitSet, Set, SetMultimap, ScoredSortedSet, SortedSet, Map, ConcurrentMap, List, ListMultimap, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, ReadWriteLock, AtomicLong, CountDownLatch, Publish / Subscribe, HyperLogLog) in a distributed environment.

What are the advantages and disadvantages of Jedis versus Redisson?

Jedis is the client of Redis's Java implementation, and its API provides more comprehensive support for Redis commands; Redisson implements a distributed and scalable Java data structure, compared with Jedis, the function is relatively simple, does not support string operation, does not support sorting, transactions, pipes, partitions and other Redis features. The purpose of Redisson is to promote the separation of users' attention from Redis, so that users can focus more on dealing with business logic.

Other questions

The difference between Redis and Memcached

Both are non-relational in-memory key-value databases, now companies generally use Redis to implement caching, and Redis itself is becoming more and more powerful! There are the following main differences between Redis and Memcached:

Contrast parameter

Redismemcached type 1. Supports memory 2. Non-relational database 1. Supports memory 2. Key-value pair form 3. Cached formal data storage type 1. String 2. List 3. Set 4. Hash 5. Sort Set [commonly known as ZSet] 1. Text type 2. Binary type query [operation] type 1. Batch operation 2. Transaction support 3. Each type of CRUD1 is different. Commonly used CRUD 2. A small number of other commands add functions 1. Publish / subscribe model 2. Master-slave partition 3. Serialization support 4. The script supports [Lua script] 1. Multithreaded service supports network IO model single-threaded multiplex IO reuse model

1. Multithreaded, non-blocking IO mode event library

AeEventLibEvent persistence support

1. RDB 2. AOF does not support

Cluster mode

Native support cluster mode, can achieve master-slave replication, read-write separation

There is no native cluster mode, so we need to rely on the client to implement the memory management mechanism of writing data to the cluster.

In Redis, not all data is stored in memory all the time, and data that can be exchanged from some unused value to disk Memcached will always be in memory. Memcached divides memory into blocks of specific length to store data to completely solve the problem of memory fragmentation. However, this approach results in low memory utilization. For example, if the block size is 128 bytes and only 100 bytes of data is stored, the remaining 28 bytes will be wasted. Applicable scenario

Complex data structure, persistence, high availability requirements, large value storage content

Pure key-value, a business with a large amount of data and a large amount of concurrency

(1) all values of memcached are simple strings, and redis, as its replacement, supports richer data types.

(2) redis is much faster than memcached.

(3) redis can persist its data

How to ensure the data consistency between cache and database when double writing?

As long as you use cache, it may involve dual storage and double write of cache and database. As long as you are double write, there must be a problem of data consistency, so how do you solve the problem of consistency?

Generally speaking, if your system does not strictly require the cache + the database must be consistent, the cache can be slightly inconsistent with the database occasionally. It is best not to do this solution by serializing read and write requests into a memory queue to ensure that there will be no inconsistencies.

After serialization, the throughput of the system will be greatly reduced, using several times more machines than normal to support a request on the line.

Another way is that there may be temporary inconsistencies, but the chances are very small, that is, to update the database before deleting the cache.

Problem scenario

Describe the solution: write cache first, then write database, cache write success, database write failure, cache write success, but write database failure or response delay, then the next time you read (and read) the cache, there will be a dirty read of the write cache, which is wrong. You need to write the database first and set the old cache to invalidate. When reading data, if the cache does not exist, read the database, then write the cache, write the database first, then write the database successfully, write the cache failed, write the database successfully, but write the cache failed, then the next time you read (and read) the cache, the data cache cannot be read, if the read cache fails, read the database first The implementation of writing cache requires cache asynchronous refresh index database operation and write cache are not in the same operation step, for example, in distributed scenarios, when it is impossible to write cache at the same time or need asynchronous refresh (remedial measures) to determine which data is suitable for such scenarios, determine reasonable data inconsistency time and user data refresh interval according to empirical values.

Redis common performance problems and solutions?

1. Master is best not to do any persistence work, including memory snapshots and AOF log files, especially do not enable memory snapshots for persistence.

2. If the data is critical, a Slave enables AOF to back up data. The policy is to synchronize data once per second.

3. For the speed of master-slave replication and the stability of connection, Slave and Master should be in the same local area network.

4. Try to avoid adding slave libraries to the stressed master libraries.

5. When Master calls BGREWRITEAOF to rewrite the AOF file, AOF will take up a lot of CPU and memory resources when rewriting, resulting in excessive service load and temporary service suspension.

6. For the stability of Master, master-slave replication should not use graph structure, but one-way linked list structure is more stable, that is, the master-slave relationship is: Master.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report