How to implement redis distributed caching 07/19 Update SLTechnology News&Howtos

How to implement redis distributed caching

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Summary: first: what is Redis? Redis is a memory-based, persistent log-type, Key-Value database high-performance storage system, and provides multi-language API. Second, there are more and more background data structure (Data Structure) requirements, but not in memcache, which affects the development efficiency and performance requirements. With the increase of the number of read operations, it needs to be solved. The processes experienced are: database read-write separation (Cache S) > database using multiple Slave- > adding Cache (memcache)-> switching to Redis to solve the write problem: split the table horizontally, split the table, and put some users on this table. Some users put it on another table.

First: what is Redis?

Redis is a memory-based, persistent log-type, Key-Value database high-performance storage system, and provides multi-language API.

Second: the emergence of background

There are more and more requirements for data structure (Data Structure), but not in memcache, which affects the efficiency of development.

Performance requirements, which need to be addressed as the volume of read operations increases, are as follows:

Database read-write separation (M Slave- S)-> Database using multiple Cache > add Cache (memcache)-> go to Redis

Solve the writing problem:

Split horizontally, split the table, put some users on this table and some users on another table

Reliability requirement

The problem of "avalanche" of Cache is puzzling.

Cache faces the challenge of rapid recovery

Development cost requirements

The cost of maintaining consistency between Cache and DB is getting higher and higher (first clean DB, and then clean the cache, no, it's too slow!)

Developers need to keep up with the influx of product demand

The most expensive hardware is the machine at the database level, which is basically several times more expensive than the front-end machine, which is mainly IO-intensive and consumes hardware.

Complex maintainability

The cost of consistency maintenance is getting higher and higher

When BerkeleyDB uses B-tree, it will write new ones all the time, and no files will be reorganized internally; this will lead to larger and larger files; when large, files need to be archived, and archiving operations should be done regularly.

In this way, a certain amount of down time is required.

Based on the above considerations, Redis is selected.

Third: the application of Redis in Sina Weibo

Introduction to Redis

1. 5 data structures are supported

Support for strings, hashes, lists, sets, sorted sets

String is a good storage method, which is used for counting storage. Sets is great for building index libraries.

2. Kmurv storage vs Kmurv cache

Sina Weibo currently uses 98% of persistent applications and 2% of caches, using 600 + servers

There is not much difference between persistence applications and non-persistence methods in Redis:

If the non-persistent one is 80-90,000 tps, then the persistence is about 70,000-80,000 tps.

When using persistence, you need to consider the ratio of persistence to write performance, that is, the ratio of the amount of memory used by redis to the write rate of the hard disk.

3. Community is active

Redis currently has more than 30, 000 lines of code, the code is concise, there are many ingenious implementations, and the author is addicted to technical cleanliness.

The community activity of Redis is very high, which is an important indicator to measure the quality of open source software. in the initial stage of open source software, there is generally no commercial technical service support. if there is no active community to support it, there is no place to ask for help in the event of problems.

Basic principles of Redis

Redis persistence (aof) append online file:

Write log (aof) and merge with memory to a certain extent. Append and then append, write disk sequentially, and have very little impact on performance

1. Single instance single process

Redis uses a single process, so only one CPU is used for an instance when configured.

When configuring, if you need to maximize the utilization of CPU, you can configure the number of Redis instances corresponding to the number of CPU and the number of Redis instances corresponding to the number of ports (8-core Cpu, 8 instances, 8 ports) to improve concurrency:

When testing on a single machine, a single piece of data is 200 bytes, and the test result is 80 ~ 90,000 tps.

2. Replication

Procedure: data is written to master- > master is stored in slave's rdb-> slave loads rdb into memory.

Storage point (save point): when the network is down, after connecting, continue to upload.

The first synchronization under Master-slave is full pass, followed by incremental synchronization.,

3. Data consistency

The possibility of inconsistency between multiple nodes after long-term operation

Develop two tool programs:

1. For a large amount of data, there will be a periodic full check.

two。 Check the incremental data in real time for consistency

The inconsistency caused by the failure of the master library to synchronize the slave database in time is called the delay problem.

For scenarios where consistency requirements are not so strict, we only need to ensure final consistency

For the delay problem, we need to analyze the characteristics of the business scenario and add strategies from the application level to solve this problem.

For example:

1. Newly registered users must first query the main library

two。 After the registration is successful, you need to wait for 3 seconds before jumping, and the backend is doing data synchronization at this time.

Fourth: the architecture design of distributed cache.

1. Architecture design

Because redis is a single point, it needs to be used in the project and must be distributed on its own. The basic architecture diagram is as follows:

two。 Distributed implementation

The consistent hash of key is used to realize the distribution of key corresponding redis nodes.

Implementation of consistent hash:

L hash value calculation: by supporting MD5 and MurmurHash calculation methods, the default is MurmurHash, efficient hash calculation.

The realization of consistency: simulate the ring structure through the TreeMap of java to achieve uniform distribution.

The choice of 3.client

The modification of jedis is mainly the modification of partition module, which supports partition according to BufferKey, initializes different ShardInfo according to different redis node information, and modifies the underlying implementation of JedisPool, so that the connection pool pool supports following the construction method of key,value, creating different jedis connection clients according to different ShardInfos, achieving the effect of partition and providing layer calls.

4. Description of the module

L dirty data processing module to deal with failed cache operations.

L shielding the monitoring module, for the abnormal monitoring of jedis operation, when an exception occurs in a node, it can control the removal of redis nodes and other operations.

The whole distributed module removes abnormal redis nodes through hornetq. The addition of new nodes can also be realized by the reload method. (this module can also be easily implemented for new nodes)

The implementation of the above distributed architecture meets the needs of the project. In addition, for some important use of cached data, you can set some redis nodes separately to set a specific priority. In addition, for the design of the cache interface, we can also implement the basic interface and some special logic interfaces according to the requirements. For cas-related operations, as well as some business operations can be achieved through its watch mechanism.

Disclaimer: all blogs serve the distributed framework as the technical support and description of the framework, which is for enterprises and is a large-scale Internet distributed enterprise architecture. We will introduce the deployment of high-availability cluster projects on linux later.

Friends who are willing to understand the framework technology or source code directly ask (Penguin): 2042849237 you are welcome to learn and study related technologies.

For more details, source code reference: http:// × × / technology

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.