Logs that you know nothing about fall into the database asynchronously 07/11 Update SLTechnology News&Howtos

Logs that you know nothing about fall into the database asynchronously

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Preface

In the process of Internet design architecture, the asynchronous storage of logs has become an indispensable part of the high concurrency link. Why is it indispensable in the high concurrency link? The reason is that if you directly use mq to store logs, under low concurrency, the production data at the production end, and then asynchronously at the consumer end, there will be no problem, and the performance is exceptionally good, so it is estimated that the tp99 should all be within 1ms. But once the concurrency grows, slowly you will find that the tp99 on the production side has been growing, from 1ms to 2ms to 4ms, to send timeout. Especially at the time of great promotion, our system experienced this situation, when the sending of mq took longer than 200ms, and even a lot of timeout was generated at one time.

Considering that this situation occurs only in the case of high concurrency, today we are going to explore a more reliable way to store asynchronous logs, so as to ensure that the method used will not cause the interface ops to drop continuously or even become unavailable because of high concurrency.

Scheme 1: asynchronous appender implementation based on log4j

This kind of scheme depends on log4j. In the asynchronous appender of log4j, production, consumption and storage are carried out through mq. It is equivalent to establishing a buffer between the interface and the mq, which separates the dependency of the interface from the mq, so that the operation of the mq does not affect the ops of the interface.

Because of the asynchronous mode and the asynchronous discard policy strategy, when a large amount of data comes in and the buffer is full, some of the data will be discarded. This scheme is suitable for business scenarios that can tolerate data loss, but not for business scenarios that have strict requirements for data integrity.

Let's take a look at the specific implementation:

First, we need to customize an Appender, which inherits from log4j's AppenderSkeleton class, as follows:

Public class AsyncJmqAppender extends AppenderSkeleton {@ Resource (name = "messageProducer") private MessageProducer messageProducer; @ Override protected void append (LoggingEvent loggingEvent) {asyncPushMessage (loggingEvent.getMessage ());} / * Asynchronous call jmq output log * @ param message * / private void asyncPushMessage (Object message) {CompletableFuture.runAsync (()-> {Message messageConverted = (Message) message Try {messageProducer.send (messageConverted);} catch (JMQException e) {e.printStackTrace ();}}); @ Override public boolean requiresLayout () {return false;} @ Override public void close () {}}

Then in log4j.xml, configure for this class:

Finally, it can be used normally as follows:

Private static Logger logger = LoggerFactory.getLogger ("filelog_appender_logger")

Note: there is a performance problem with log4j that needs to be noted here. In log4j's conversionPattern, it is best not to have C% L% wildcards. Pressure test practice shows that these two wildcards will reduce the efficiency of log4j logging by 10 times.

The first option is very simple and strips off the performance problems caused by the direct dependence of the interface on mq. However, it can not solve the problem of data loss (but we can actually create a local policy to deal with data that is too late to deal with, which can greatly reduce the chance of data loss). But many business scenarios require data not to be lost, so this gives rise to another set of solutions.

Solution 2: incremental consumption of log4j logs

In this way, worker is enabled to consume log4j log information incrementally in the background, which is completely separated from the API. Compared with the first scheme, this method can ensure that the data is not lost and does not affect the ops of the interface at all. However, in this way, because the background worker starts scanning in the background, the data stored in the database will be slower, for example, it will be completed after one minute. Therefore, it is suitable for scenarios where the real-time performance of database data is not high.

The specific implementation steps are as follows:

First of all, the logs that need to be consumed incrementally are typed into a folder, and a timestamped log file is generated every day in days. Since log4j does not support directly timestamped log file generation, you need to introduce the log4j.extras component here, and then configure log4j.xml as follows:

Then the declaration in the code is as follows:

Private static Logger businessLogger = LoggerFactory.getLogger ("file_rolling_logger")

Finally, where logging is needed, it is used as follows:

BusinessLogger.error (JsonUtils.toJSONString (myMessage))

This allows you to print the log to a separate file and generate one per day by date.

Then, after the log file is generated, we can open our worker for incremental consumption. For the incremental consumption mode here, we choose RandomAccessFile as a class. Because of its unique way of reading sites, it is very convenient for us to consume incremental files according to the location of the sites, thus avoiding the inefficient way of reading line by line.

Note that a separate locus file is created for each log file, which stores the locus reading information of the corresponding file. When worker scanning starts, it will first read the locus information in the locus file, then find the corresponding log file, and start consuming from the location of the locus information. This is the core of the whole incremental consumption worker. The specific code implementation is as follows (the code is too long and folded):

+ View Code

Since worker scanning is started for consumption at regular intervals, it may take more than one minute for data to be generated and stored in storage, but it is not tolerated in some business scenarios that require high data latency, such as inventory deduction, so here we extend the third method, asynchronous log consumption based on in-memory file queue.

Here I would like to recommend a framework learning exchange group. Communication and learning group number: 478030634 will share some videos recorded by senior architects: Spring,MyBatis,Netty source code analysis, high concurrency, high performance, distributed, micro-service architecture principles, JVM performance optimization, distributed architecture, etc. these become the necessary knowledge system for architects. You can also get free learning resources and have benefited a lot at present.

Scheme 3: asynchronous log consumption based on memory file queue

Because both scheme 1 and scheme 2 rely heavily on log4j, and the scheme itself has the disadvantage of either losing data or storing it for a long time, it is not so satisfactory. However, the approach of this scheme not only solves the problem of data loss, but also solves the embarrassment that the time of data storage is prolonged, so it is the ultimate solution. And in the process of large-scale promotion, this method has experienced the actual combat test, and can be widely used.

The memory file queue mentioned in this scheme is a memory file queue based on RandomAccessFile and MappedByteBuffer developed by our company. The queue core uses ArrayBlockingQueue and provides a produce method to operate data into the pipeline and a consume method to operate data out of the pipeline. And there is a worker in the background that starts all the time. After every 5ms or traversal of 100 pieces of data, the data will be dropped to the disk in case of data loss. There are so many specific designs. If you are interested, you can practice by yourself according to the information I provide.

Due to the blessing of this middleware, data production only needs to be pressed into the pipeline, and then the consumer side can consume. Unconsumed data will be removed from the market to guard against data loss. When there is a big push, when a large amount of data comes in, the interface will be blocked when the pipeline is full, and the data will not be abandoned. Although it may cause the interface to become unresponsive at that moment, the operation will not be paused because there are disk down operations and consumption operations (this operation manipulates JVM data in memory outside the heap and is not affected by GC. Why? Because MappedByteBuffer is used, this blocking does not affect the ops of the interface as a whole.

In practice, ArrayBlockingQueue is used as a core queue, which is obviously globally locked. Later, we will consider upgrading to a lock-free queue, so we will refer to the bounded unlocked queue in Netty: MpscArrayQueue. The performance is expected to be better.

Limited to the company policy, I will only provide a general idea, but will not provide specific code, there is a comment area to communicate with it.

These are the three stages I went through when making asynchronous log consumption, and optimized to the current way step by step. Although the process is tortuous, the result is gratifying. If you like to give a recommendation, I will continue to update the series you do not know, in order to achieve the effect of throwing bricks and attracting jade.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.