How to use ES to do Redis Monitoring 07/02 Update SLTechnology News&Howtos

How to use ES to do Redis Monitoring

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)05/31 Report--

This article focuses on "how to use ES to do Redis monitoring", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "how to use ES to do Redis monitoring"!

Preface

Figure: Redis heat ranking

Redis is very popular and easy to use, no matter in the business application system, or in the field of big data has an important position; but Redis is also very fragile, not good use, a lot of problems. Before 2012, it was mainly based on memcached, and then moved to the Redis camp. I have experienced single instance mode, master-slave mode, sentry mode, agent mode and cluster mode. It is rarely used well at the company level, and it is very one-sided for Redis control, which leads to a lot of problems in actual projects.

For Redis to work well, you need to master three levels as a whole:

Development level

Architecture level

Operation and maintenance level

Among them, architecture and operation and maintenance are very important, most small and medium-sized enterprises only meet common functions at the development level, slightly larger data scale, higher business complexity, it is easy to have a variety of architecture and operation and maintenance problems. The purpose of this article is to explore the Redis monitoring system. At present, of course, there are many mature products in the industry, but I think they are all very conventional, only do some coarse-grained monitoring, and do not refine according to the characteristics of business requirements according to local conditions, so as to provide architecture development optimization solution in reverse.

The content of this article will focus on the following issues:

What are the aspects of Redis monitoring system?

What have we done to build a Redis monitoring system?

To what extent should the Redis monitoring system be refined?

Why use ELK to build a monitoring system?

Demand background

Project description

The company's business scope belongs to the car networking industry, with millions of real car owners, and the business project focuses on the car owner's life service. in order to improve the system performance, Redis is introduced as the cache middleware, as described as follows:

Deployment architecture adopts Redis-Cluster mode

There are dozens of background applications, with more than 200 application instances.

All application systems share a cache cluster

With dozens of cluster nodes and disaster recovery backup environment, the number of nodes has doubled.

The cluster node has a higher memory configuration.

Figure: schematic diagram of Redis cluster architecture and application architecture

Problem description

At the beginning of the system, everything about Redis is normal. With more and more application system access and more application system sub-module access, some problems begin to appear. The application system is aware and the cluster server is aware, as described below:

Cluster node crash

Fake death of cluster nodes

Some back-end applications respond very slowly to the cluster.

In fact, the root cause of the problem is the lack of architecture, operation and maintenance. It is easy to monitor the operation of the Redis cluster server, and it also provides a lot of direct command methods. However, you can only see some common metrics of the server, which cannot be analyzed in depth, and you have no knowledge of the internal operation of Redis, especially how business applications use Redis cluster:

What is the heat problem used by Redis clusters?

Which applications consume more Redis memory resources?

Which applications take up the highest number of Redis visits?

Which applications do not use Redis types reasonably?

How about the distribution of Redis resources for application modules?

What are the hot issues of using Redis clusters for applications?

Monitoring system

The purpose of monitoring is not only to monitor Redis itself, but also to make better use of Redis. Traditional monitoring is generally simple and not systematic, but for Redis, I think it at least includes: one is the server side, the second is the application side, and the third is the joint analysis of the server side and the application side.

Server:

First of all, the server side is at the operating system level, such as CPU, memory, network IO, disk IO, process information running on the server, etc.

Redis running process information, including server running information, number of client connections, memory consumption, persistence information, number of keys, master-slave synchronization, command statistics, cluster information, etc.

Redis runs the log, which records some important operation processes, such as running persistence, which can effectively help analyze programs that crash and fake death.

Application side:

The application side, get some behaviors of the application side using Redis, specific which applications which modules occupy the most Redis resources, which applications which modules consume the most Redis resources, which applications and which modules are misused, and so on.

Joint analysis:

Joint analysis combines the operation of the server side with the behavior used by the application side, such as: some of the reasons for the sudden blocking of the server side may be that the application side sets a large cache key value, or the list of key values used, the blocking is caused by a large amount of data.

Solution

Why choose the Elastic-Stack technology stack?

Most third parties only monitor some metrics, and ELK (Elasticsearch, Logstash, Kibana) is still used for detail logs, that is, after using third-party monitoring metrics, it is necessary to set up an ELK cluster to view detail logs.

In addition, the advantages of Elastic-Stack technology stack integration, indicators can also be, log files can also, from the beginning of collection to storage, to the final report panel integration is very good, the threshold is very low.

Let's talk in detail about how we did it and what work we did.

Server system

The Elastic-Stack family has Metricbeat products that support system-level information collection. The Elastic cluster address and system metrics module can be launched with simple configuration, and the existing system monitoring panel will be created in Kibana, which is very simple and fast, and can be done by general operation and maintenance.

Figure: metrcibeat schematic diagram

The sample configuration of system metrics information collection is as follows:

Server cluster

To collect Redis cluster operation information, the industry usually uses the info command provided by Redis to collect it on a regular basis.

The information obtained by info includes the following:

General information about server:Redis server

Clients: the connection part of the client

Memory: memory consumption related information

Information about persistence:RDB and AOF

Stats: general statistics

Replication: master / slave replication information

Cpu: statistics of CPU consumption command

Stats:Redis command

Statistics of cluster:Redis cluster information

Keyspace: related statistics of database

The Metricbeat products of the Elastic-Stack family also support the Redis module, which is also obtained by the info command, but has some implementation limitations, as described below:

The master-slave relationship information of Redis cluster cannot be expressed by Metricbeats.

Some statistical information of Redis cluster is always cumulative, such as the number of commands. If you want to get the peak value of the number of commands, you cannot get it.

Redis cluster status information changes, Metricbeats is not dynamic, such as new nodes in the cluster, offline nodes and so on.

So here we refer to the CacheCloud product (open source by Sohu team). We customize the design and development of Agent, regularly collect information from the Redis cluster, and do some simple calculation of statistical values internally, convert it to Json, write it to a local file, and collect and send it to Elasticsearch through Logstash.

Figure: schematic diagram of Redis server running information collection architecture

Server log

It is very easy for the Redis server to run log collection, directly through the Filebeat products of the Elastic-Stack family, in which there is a Redis module, configure the Elastic server, and the log file address can be.

Figure: server log collection process

Redis running log collection configuration:

Application end

Application-side information collection is not only the most important part of the whole Redis monitoring system, but also the most troublesome to achieve and the longest link. The first is to modify the jedis (technology stack Java) source code, add the buried point code, recompile and reference to the application project, any command operation of the application side for the Redis cluster will be captured, and the key information will be recorded, and then written to the local file.

Figure: Redis application behavior collection architecture diagram

The format of the data collected by the application is as follows:

Figure: data collected by the application side

Jedis modification:

The information recorded by the jedis transformation is as follows:

R_host: access the server address and port of the Redis cluster, one of which is ip:port

R_cmd: execute command types, such as get, set, hget, hset, etc.

R_start: start time of command execution

R_cost: time consumption

R_size: get the key size or set the key size

R_key: get the key name

R_keys: a secondary split of key values, with no limit to the length of the array. It is necessary to emphasize that all application systems share a cluster, so the key values of the application system are standardized and are divided according to special symbols, such as "application name _ system module _ dynamic variable _ xxx", which is mainly easy for us to distinguish.

There are several areas in the jedis transformation, as follows:

Class Connection.java file, statistics start, record command execution start time; statistics end, record command end time, time consumption, etc., and write to log stream

Class JedisClusterCommand file, the place to get the key key, convenient to analyze the behavior of the application key later.

There are two places in the class Connection.java file:

Figure: where the code is buried in the class Connection.java file

The class JedisClusterCommand file embeds the code .java file in one place:

Figure: buried point code of class JedisClusterCommand file

Logback modification:

All applications use logback to write log files. In order to be more accurate, the application side also needs to obtain some information of the application side when writing to the log, as shown below:

App_ip: the IP address where the application side is deployed on the server

App_host: the name of the server on which the application side is deployed.

Customize a Layout to automatically obtain the IP address and server name of the application side:

Figure: custom Layout of Logback

App configuration:

App configuration belongs to the final work, which mainly outputs the log data of the buried point. You can configure the log logback.xml file:

Figure: configure the application log file logback.xml

Log collection:

Logstash is used for application log collection, and the log directory is configured to point to the Elastic cluster, so that the overall monitoring log collection part is over.

Log analysis

Redis server log analysis is relatively simple, just some conventional indicators, create a key chart, it is easy to see the problem. Focus on the log analysis of the application side.

Figure: some behavior diagrams using Redis on the application side

After the ELK monitoring system was launched, we observed and analyzed continuously for two weeks and obtained some monitoring results, such as:

Some of the key values on the application side are too large, which actually exceeds the 1MB. This kind of key value access takes a lot of time and will cause serious blocking.

Some applications actually use Redis as a database

Some use the List type as a message queue, accessing hundreds of thousands of data at a time

Some applications operate on the cluster with a high frequency, accounting for more than half of the total.

There are many more, so we won't describe them one by one.

Follow-up plan

The monitoring system is equivalent to the eye of the architect, with this, the optimization and transformation plan for Redis is easy to work out:

The application side and misuse should all be changed.

On the server side, some splits are carried out according to the application data, and some dedicated clusters are split, specifically for some application use or scenarios.

Developers, if there are any new business modules that need to be connected to Redis, you need to inform the architects for review.

At this point, I believe you have a deeper understanding of "how to use ES to do Redis monitoring". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.