Implementation of Log structured conversion based on Visual configuration 07/08 Update SLTechnology News&Howtos

Implementation of Log structured conversion based on Visual configuration

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Introduction: the overall architecture of data bus DBus mainly includes six modules, namely: log capture module, incremental conversion module, full extraction program, log operator processing module, heartbeat monitoring module, Web management module. The respective functions of the six modules are connected to each other, which constitutes the working principle of DBus: to obtain the incremental data log in real time by reading the RDBMS incremental log (supporting full pull); to obtain the data in real time based on the crawling tools such as Logstash,flume,filebeat, and to output the data structurally in a visual way. This paper mainly introduces the implementation of log structured transformation based on visual configuration in DBus.

1. The principle of structured log 1.1 Source log crawling

DBus can interface with a variety of log data sources, such as Logstash, Flume, Filebeat and so on. The above components are popular log crawling tools in the industry. on the one hand, it is convenient for users and the industry to unify standards and facilitate the integration of user technical solutions; on the other hand, it also avoids unnecessary repetition of wheels. The captured data is called raw data log (raw data log), which is written into Kafka by the crawling component and waits for subsequent processing by DBus.

1.2 Visualized configuration rules to make logs structured

Users can customize the configuration of log source and destination. Data from the same log source can be output to multiple targets. For each "log source-destination" line, users can configure the corresponding filtering rules according to their own needs. The log processed by the rule operator is structured, that is, it has schema constraints, similar to the tables in the database.

1.3 Rule operator

DBus designs rich and easy-to-use operators for customizing data. Users' data processing can be divided into multiple steps, and the data processing results of each step can be viewed and verified immediately, and different operators can be used repeatedly until the data they need are converted and cut out.

1.4 execution engine

Apply the configured rule operator group to the execution engine, preprocess the target log data to form structured data, and output it to Kafka for downstream data users to use. The system flow chart is as follows:

According to DBus log design principles, the same original log can be extracted into one or more tables. Each table is structured and satisfies the same schema constraints.

Each table is a collection of rule operator groups, each table can have one or more rule operator groups; each rule operator group is composed of a set of rule operators, each operator is independent

Which table should any raw data log (raw data log) belong to?

If the user defines several logical tables (T1 and T2). ), which is used to extract different types of logs, then each log needs to be matched with the rule operator group:

The execution process of all rule operator groups entering a table T1 is qualified to enter the rule operator group, and the log converted by the execution engine to structured table data that does not meet the extraction conditions attempts the next rule operator group for all rule operator groups of T1, if none of them meet the requirements, proceed to the execution process of the next table T2, and so on if the log does not meet the filtering rules of any table Enter the _ unknown_table_ table

For example, for the same application log, it may belong to more than one rule group or Table, but in our defined rule group or Table, as long as it meets the filtering conditions, the application log can be extracted by the rule group, that is, it ensures that the same application log can belong to different rule groups or Table.

Rule operator is the basic unit to filter, process and transform data. Common rule operators are shown in the figure above.

The operators are independent and can be combined arbitrarily, thus many complex and advanced functions can be realized. Through the iterative use of the operators, the processing of arbitrary data can be achieved finally. Users can develop custom operators, and the development of operators is very easy. Users can develop arbitrary operators as long as they follow the basic interface principles.

2. An example of DBus log processing

Take the DBus cluster environment as an example. Two machines (that is, master-slave) in the DBus cluster deploy heartbeat programs for monitoring, statistics, early warning, and so on. The heartbeat program will generate some application logs, which contain all kinds of event information. If we want to classify these logs and structure them into the database, we can use DBus log programs to process the logs.

DBus can access a variety of data sources (Logstash, Flume, Filebeat, etc.). Here, take Logstash as an example to illustrate how to access DBus monitoring and alarm log data.

Because there are monitoring and warning logs on dbus-n2 and dbus-n3 respectively, we deploy Logstash programs on two machines respectively. Heartbeat data is generated by Logstash's own heartbeat plug-in, which is used to facilitate DBus to count and output the data, and to warn the source log extractor (in this case, Logstash) (for Flume and Filebeat, because they do not have heartbeat plug-ins, they need to generate additional heartbeat data on a regular basis). The data written by the Logstash program to the Kafka contains both normal format data and heartbeat data.

Here, it is not limited to two machines with Logstash programs deployed. DBus does not limit the number of Logstash. For example, if the application logs are distributed on dozens of machines, you only need to deploy Logstash programs on each machine and extract the data into the same Kafka Topic. DBus will be able to process, monitor, early warning, count and so on the data of all hosts.

2.1 start Logstash

After starting the Logstash program, we can read the data from topic: heartbeat_log_logstash. The sample data is as follows:

1) heartbeat data

2) General log data

2.2 configuration rules

Next, we just need to configure the appropriate rules in DBus Web to process the data.

First, create a new logical table sink_info_table, which is used to extract the log information of sink events, and then configure the rule groups of the table (one or more, but all the filtered data of the rule groups need to meet the same schema characteristics). Heartbeat_log_logstash as the original data topic, we can visually configure the data in real time (WYSIWYG, impromptu verification).

1) read the original data log

You can see that Logstash pre-extracts the basic information that already contains log4j, such as path, @ timestamp, level, and so on. But the details of the data log are in the field log. Because the output of different data logs is different, you can see that the log column data is different.

2) extract the columns of interest

If we are interested in raw information such as timestamp, log, and so on, we can add a toIndex operator to extract these fields:

It should be pointed out here that we consider using array subscripts for a reason:

Not all columns have their own column names (for example, raw data extracted by flume, or data columns processed by the split operator); subscript methods can specify columns in an array way (similar to python mode, for example: 1:3 represents 1 minute 2 columns)

Therefore, all subsequent operations are accessed based on array subscripts.

By executing the rule, you can see the extracted field:

3) filter the required data

In this example, we are only interested in containing "Sink to influxdb OK!" Are interested in the data. So add a filter operator to extract the "Sink to influxdb OK!" in column 7. Row data of the content:

After execution, only qualified log row data will exist.

4) extract specific columns

By adding a select operator, we are interested in the contents of columns 1 and 3, so we extract them.

Execute the select operator, and the data will contain only the first and third columns.

5) deal with data in the way of regular expressions

We want to extract values that match a specific regular expression from the data in column 1 and filter the data using the regexExtract operator. The regular expressions are as follows: http_code= (\ d *). * type= (. *), ds= (. *), schema= (. *), table= (. *)\ s.normalerrorCount = (\ d *). Users can write custom regular expressions.

After execution, the data after the regular expression execution is obtained.

6) Select output column

Finally, we output the columns of interest, use the saveAs operator to specify the column name and type, which is convenient to save in the relational database.

After executing the saveAs operator, this is the final output data sample that has been processed.

2.3 View structured output results

Save the rule group configured in the previous step, and the log data can be generated through the DBus operator engine. At present, according to the reality of the project, the data output by DBus is in UMS format. If you do not want to use UMS, you can customize it through simple development.

Note: UMS is a general data exchange format defined and used by DBus and is a standard JSON. It contains both schema and data information. For more introduction to UMS, please refer to the introduction of the DBus open source project home page. Open source address: https://github.com/bridata/dbus

The following is a sample of the structured UMS data output from the test case:

2.4 Log monitoring

To facilitate the grasp of data extraction, rule matching, monitoring and early warning, we provide a visual real-time monitoring interface for log data extraction. As shown in the figure below, you can learn the following information at any time:

Real-time data number error number (error number refers to: error occurs during the execution of the operator, helps to find whether the operator matches the data, and is used to modify the operator. DBus also provides the log read-back function to avoid losing part of the data) whether the data delay is normal or not

The monitoring information contains the monitoring information from each host in the cluster, and the host IP (or domain name) is used to monitor, count and warn the data respectively.

There is also a table called _ unkown_table_ in the monitoring that indicates the number of data items that have not been matched. For example, the log crawled by Logstash contains log data of five different events. We have captured only three of these events, and the rest of the data that have not been matched are all in the _ unkown_table_ count.

DBus can also access Flume, Filebeat, UMS and other data sources. With a little configuration, you can achieve the same processing effect as Logstash data sources. For more information on how DBus handles log, please refer to:

Https://bridata.github.io/DBus/install-logstash-source.html

Https://bridata.github.io/DBus/install-flume-source.html

Https://bridata.github.io/DBus/install-filebeat-source.html

After the application log is processed by DBus, the original data log is converted into structured data and output to Kafka for downstream data users to use, such as dropping the data into the database through Wormhole. For details on how to use DBus in conjunction with Wormhole, please refer to: how to design a real-time data platform (technical article).

Author: Zhong Zhenlin

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.