The usage of Filebeat, a sharp tool for log collection 04/26 Update SLTechnology News&Howtos

The usage of Filebeat, a sharp tool for log collection

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "the use of Filebeat, a sharp tool for log collection". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The Filebeat used in this article is version 7.7.0, which will be explained in the following aspects:

What is Filebeat and what can it be used for?

What is the principle of Filebeat and how is it formed?

How should Filebeat play?

What is Filebeat?

The relationship between Filebeat and Beats

First, Filebeat is a member of Beats.

Beats is a lightweight log collector. In fact, the Beats family has six members. In the early ELK architecture, Logstash was used to collect and parse logs, but Logstash consumes a lot of resources such as memory, CPU, io and so on. Compared with Logstash,Beats, the CPU and memory of the system are almost negligible.

Currently, Beats includes six tools:

Packetbeat: network data (collecting network traffic data)

Metricbeat: metrics (collecting data such as CPU and memory usage at the system, process, and file system levels)

Filebeat: log files (collect file data)

Winlogbeat:Windows event log (collects Windows event log data)

Auditbeat: audit data (collect audit logs)

Heartbeat: runtime monitoring (collecting data when the system is running)

What is Filebeat?

Filebeat is a lightweight transport tool for forwarding and centralizing log data. Filebeat monitors the log files or locations you specify, collects log events, and forwards them to Elasticsearch or Logstash for indexing.

Filebeat works as follows: when you start Filebeat, it starts one or more inputs that are looked up in the location specified for the log data. For each log found by Filebeat, Filebeat starts the collector. Each collector reads a single log to get new content, sends the new log data to libbeat,libbeat, aggregates events, and sends the aggregated data to the output configured for Filebeat.

The flow chart of the work is as follows:

The relationship between Filebeat and Logstash

Because Logstash is run by JVM and consumes a lot of resources, the author later wrote a lightweight logstash-forwarder with less function but less resource consumption in Golang. But the author is only a person, after joining http://elastic.co, because ES itself also acquired another open source project, Packetbeat, and this project is dedicated to Golang, there is a whole team, so ES simply merges the development work of logstash-forwarder into the same Golang team, so the new project is called Filebeat.

What is the principle of Filebeat

The composition of Filebeat

Filebeat structure: consists of two components, inputs (input) and harvesters (collector), that work together to track files and send event data to your specified output, and harvester is responsible for reading the contents of a single file. Harvester reads each file line by line and sends the contents to the output. Start a harvester for each file. Harvester is responsible for opening and closing files, which means that the file descriptor remains open while harvester is running. If you delete or rename a file when you collect it, Filebeat continues to read the file. The side effect of this is that the space on the disk is reserved until harvester is turned off. By default, Filebeat keeps the file open until close_inactive is reached.

The result of shutting down harvester:

The file handler closes, and if the harvester is still deleted while reading the file, the underlying resources are released.

File collection will not be started again until the end of the scan_frequency.

If the file is moved or deleted when harvester is closed, the collection of the file will not continue.

An input is responsible for managing harvesters and finding all source reads. If the input type is log, input looks for all files on the drive that match the defined path and launches a harvester for each file. Each input runs in its own Go process, and Filebeat currently supports multiple input types. Each input type can be defined multiple times. Log input checks each file to see if harvester needs to be started, if harvester is already running, or if the file can be ignored.

How Filebeat saves the state of a file

Filebeat retains the state of each file and often flushes the status to the registry file on disk. This state is used to remember the last offset read by harvester and to ensure that all log rows are sent. If the output (such as Elasticsearch or Logstash) is not accessible, Filebeat tracks the last line sent and continues to read the file when the output is available again. When Filebeat is running, the state information of each input is also saved in memory. When Filebeat restarts, the data from the registry file is used to rebuild the state, and Filebeat continues each harvester at the last known location. For each input, Filebeat retains the state of each file it finds. Because the file can be renamed or moved, the file name and path are not sufficient to identify the file. For each file, Filebeat stores a unique identifier to detect whether the file was previously captured.

How to guarantee at least one data consumption by Filebeat

Filebeat guarantees that the event will be delivered to the configured output at least once and that no data will be lost. Because it stores the delivery status of each event in the registry file. In cases where the defined output is blocked and all events are not acknowledged, Filebeat continues to attempt to send events until the output acknowledges that the event has been received. If Filebeat shuts down while sending events, it does not wait for the output to confirm all events before closing. When Filebeat restarts, all events that were not acknowledged before Filebeat shutdown are sent to the output again. This ensures that each event is sent at least once, but there may eventually be duplicate events sent to the output. By setting the shutdown_timeout option, you can configure Filebeat to wait for a specific time before shutting down.

How does Filebeat play?

Compressed package installation

This article uses a compressed package to install, Linux version, filebeat-7.7.0-linux-x86_64.tar.gz.

Curl-L-O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.7.0-linux-x86_64.tar.gz tar-xzvf filebeat-7.7.0-linux-x86_64.tar.gz

Sample configuration file: filebeat.reference.yml (contains all non-obsolete configuration items)

Configuration file: filebeat.yml

Basic command

For more information, see the official website: https://www.elastic.co/guide/en/beats/filebeat/current/command-line-options.html

Export # Export run # execute (default execution) test # Test configuration keystore # Secret key Storage modules # Module configuration Management setup # set initial environment

For example:. / filebeat test config # is used to test whether the configuration file is correct

Input and output

Supported input components:

Multilinemessages,Azureeventhub,CloudFoundry,Container,Docker,GooglePub/Sub,HTTPJSON,Kafka,Log,MQTT,NetFlow,Office 365 Management Activity API,Redis,s3,Stdin,Syslog,TCP,UDP (the most commonly used is Log)

Supported output components:

Elasticsearch,Logstash,Kafka,Redis,File,Console,ElasticCloud,Changetheoutputcodec (the most commonly used is Elasticsearch,Logstash)

The use of keystore

Keystore mainly prevents sensitive information from being disclosed, such as passwords, such as ES passwords. Here, you can generate a correspondence of password with a key of ES_PWD and a value of ES. You can use ${ES_PWD} when using the password of ES.

Create a keystore:filebeat keystore create that stores passwords

Then add key-value pairs to it, for example: filebeatk eystore add ES_PWD

Use the value that overrides the original key: filebeat key store add ES_PWD-force

Delete key-value pair: filebeat key store remove ES_PWD

Check the existing key-value pair: filebeat key store list

For example, its value can be used later through ${ES_PWD}, for example:

Output.elasticsearch.password: "${ES_PWD}"

Filebeat.yml configuration (Log input type as an example)

For more information, see the official website: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html

Type: log # input type is log enable: true # means that the configuration of this log type takes effect. Paths: # specifies the log to be monitored, which is currently handled according to the glob function of Go language. There is no recursion to the configuration directory, for example, if the configuration is:-/ var/log/* / *. Log #, it will only look for files ending in ".log" in all subdirectories of the / var/log directory, not files ending in ".log" in the / var/log directory. Recursive_glob.enabled: # enable global recursion mode, for example / foo/** includes / foo, / foo/*, / foo/*/* encoding:# specifies the encoding type of the file being monitored Using plain and utf-8 is an exclude_lines that can handle Chinese logs: ['^ DBG'] # does not contain matching regular lines include_lines: ['^ ERR','^ WARN'] # contains matching regular lines harvester_buffer_size: 16384 # the byte size of the buffer used by each harvester when fetching files max_bytes: 10485760 # the maximum number of bytes a single log message can have. All bytes after max_bytes are discarded and not sent. The default value is 10MB (10485760) exclude_files: ['\ .gz $'] # to match the list of regular expressions for files that you want Filebeat to ignore. Ingore_older: 0 # defaults to 0, which means disabled, can be configured with 2h ignore_older, 2m, etc., and note that ignore_older must be greater than the value of close_inactive. Indicates that files that are not updated beyond the setting value are ignored or have never been used by harvester to collect close_* # close_* configuration options to turn off harvester after a specific standard or time. Closing harvester means closing the file handler. If the file is updated after harvester is closed, the file will be picked up again after scan_frequency. However, if you move or delete a file while harvester is closed, Filebeat will not be able to receive the file again, and any data that harvester has not read will be lost. When starting the close_inactive # option, if the last log read by the closed file handle is defined as the starting point of the next reading when the close_inactive # startup option is set, it is not based on the modification time of the file. If the closed file changes, a new harverster will be started after the scan_frequency is running. It is recommended that you set at least one value greater than the log reading frequency. Configure multiple prospector to use internal timestamp mechanism for log files with different update speeds to reflect log reading. Each time the log is read to the last line, the countdown time is 2h5m to indicate that the close_rename # selected item starts. If the file is renamed and moved, the processing of filebeat shutdown file reading close_removed # selected item starts, and when the file is deleted After filebeat shuts down the processing of files to read this option, you must start clean_removed close_eof # suitable for files that are written only once, and then filebeat closes the processing of files to read close_timeout # when the selected item starts, filebeat will set a predefined time for each harvester, regardless of whether the file is read or not, when it reaches the set time, it will be shut down when close_timeout cannot be equal to ignore_older, which will result in file updates Will not be read if the output has not output log events, the timeout will not be started, at least one event must be sent, and then the haverter will be turned off. Setting 0 means not starting clean_inactived # the status setting for deleting previously harvested files from the registry file must be greater than ignore_older+scan_frequency To ensure that no status configuration options are removed while files are still being collected can help reduce the size of registry files, especially if a large number of new files are generated every day. This configuration option can also be used to prevent reuse of inode's Filebeat problem clean_removed # startup option on Linux, if the file cannot be found on disk Filebeat will be cleared from the registry if you close close removed, you must turn off clean removed scan_frequency # prospector to check the frequency of new files in the path specified for harvest. If the default 10s tail_files:# is set to true,Filebeat to monitor what's new in the file from the end of the file, send each new line of file in turn as an event, rather than resend everything from the beginning of the file. The symlinks:# symbolic links option allows Filebeat to collect symbolic links in addition to regular files. When collecting symbolic links, Filebeat opens and reads the original file even if the path to the symbolic link is reported. The backoff: # backoff option specifies how Filebeat actively crawls new files for updates. The default 1sfocus backoff option defines the amount of time Filebeat waits between checking files again after reaching the EOF. Max_backoff: # maximum time Filebeat waits before checking the file again after reaching EOF backoff_factor: # specify the waiting time of backoff attempts several times. By default, the 2 harvester_limit:#harvester_limit option limits the number of harvester that a prospector can start in parallel, which directly affects the number of files opened. Tags # list is tagged and filtered, for example: tags: ["json"] fields # optional field Select additional fields for output can be scalar values, tuples, dictionaries and other nested types default at the sub-dictionary location filebeat.inputs: fields: app_id: query_engine_12 fields_under_root # if the value is ture, then the fields is stored in the top-level location of the output document multiline.pattern # must match the regexp schema multiline.negate # defines the above pattern matching condition is negative The default is false if the pattern matching condition is'^ baked 'and the default is false mode, which means that matching according to pattern matching will not merge log lines that begin with b. If it is true, log lines that do not begin with b will be merged multiline.match # specify how Filebeat combines matching lines into events, before or after Depending on the maximum number of rows that can be combined into an event by negate multiline.max_lines # specified above, it will be discarded. By default, 500 multiline.timeout # defines the timeout. If no match is found in starting a new event within the timeout, logs will be sent. Default is 5s max_procs # to set the maximum number of CPU that can be executed at the same time. The default value is the number of logical CPU available in the system. Name # specifies the name for the filebeat. The default is the hostname of the host.

Example 1: Logstash as output

Filebeat.yml configuration:

# = Filebeat inputs = = filebeat.inputs: # Each-is an input. Most options can be set at the input level, so # you can use different inputs for various configurations. # Below are the input specific configurations. -type: log # Change to true to enable this input configuration. Enabled: true # Paths that should be crawled and fetched. Glob based paths. Paths: # configure multiple log paths-/ var/logs/es_aaa_index_search_slowlog.log-/ var/logs/es_bbb_index_search_slowlog.log-/ var/logs/es_ccc_index_search_slowlog.log-/ var/logs/es_ddd_index_search_slowlog.log #-c:\ programdata\ elasticsearch\ logs\ * # Exclude lines. A list of regular expressions to match. It drops the lines that are # matching any regular expression from the list. # exclude_lines: ['^ DBG'] # Include lines. A list of regular expressions to match. It exports the lines that are # matching any regular expression from the list. # include_lines: ['^ ERR','^ WARN'] # Exclude files. A list of regular expressions to match. Filebeat drops the files that # are matching any regular expression from the list. By default, no files are dropped. # exclude_files: ['.gz $'] # Optional additional fields. These fields can be freely picked # to add additional information to the crawled log files for filtering # fields: # level: debug # review: 1 # Multiline options # Multiline can be used for log messages spanning multiple lines. This is common # for Java Stack Traces or C-Line Continuation # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [# multiline.pattern: ^\ [# Defines if the pattern set under pattern should be negated or not. Default is false. # multiline.negate: false # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern # that was (not) matched before or after or as long as a pattern is not matched based on negate. # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash # multiline.match: after # = = Outputs = #-- Logstash output-- output.logstash: # The Logstash hosts # use load balancer with multiple logstash Make hosts: ["192.168.110.130mm 5044" "192.168.110.131Optional SSL 5044", "192.168.110.132Optional SSL 5044", "192.168.110.133lug 5044"] loadbalance: true # uses load balancer # Optional SSL. By default is off. # List of root certificates for HTTPS server verifications # ssl.certificate_authorities: ["/ etc/pki/root/ca.pem"] # Certificate for SSL client authentication # ssl.certificate: "/ etc/pki/client/cert.pem" # Client Certificate Key # ssl.key: "/ etc/pki/client/cert.key"

. / filebeat-e # launch filebeat

Configuration of Logstash:

Input {beats {port = > 5044}} output {elasticsearch {hosts = > ["http://192.168.110.130:9200"] # multiple index = >" query-% {yyyyMMdd} "}} can be configured here

Example 2: Elasticsearch as output

Configuration of filebeat.yml:

# # Filebeat Configuration Example # # This file is an example configuration file highlighting only the most common # options. The filebeat.reference.yml file from the same directory contains all the # supported options with more comments. You can use it as a reference. # # You can find the full configuration reference here: # https://www.elastic.co/guide/en/beats/filebeat/index.html # For more available modules and options, please see the filebeat.reference.yml sample # configuration file. # = Filebeat inputs = = filebeat.inputs: # Each-is an input. Most options can be set at the input level, so # you can use different inputs for various configurations. # Below are the input specific configurations. -type: log # Change to true to enable this input configuration. Enabled: true # Paths that should be crawled and fetched. Glob based paths. Paths:-/ var/logs/es_aaa_index_search_slowlog.log-/ var/logs/es_bbb_index_search_slowlog.log-/ var/logs/es_ccc_index_search_slowlog.log-/ var/logs/es_dddd_index_search_slowlog.log #-c:\ programdata\ elasticsearch\ logs\ * # Exclude lines. A list of regular expressions to match. It drops the lines that are # matching any regular expression from the list. # exclude_lines: ['^ DBG'] # Include lines. A list of regular expressions to match. It exports the lines that are # matching any regular expression from the list. # include_lines: ['^ ERR','^ WARN'] # Exclude files. A list of regular expressions to match. Filebeat drops the files that # are matching any regular expression from the list. By default, no files are dropped. # exclude_files: ['.gz $'] # Optional additional fields. These fields can be freely picked # to add additional information to the crawled log files for filtering # fields: # level: debug # review: 1 # Multiline options # Multiline can be used for log messages spanning multiple lines. This is common # for Java Stack Traces or C-Line Continuation # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [# multiline.pattern: ^\ [# Defines if the pattern set under pattern should be negated or not. Default is false. # multiline.negate: false # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern # that was (not) matched before or after or as long as a pattern is not matched based on negate. # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash # multiline.match: after # = = Filebeat modules = = filebeat.config.modules: # Glob pattern for configuration loading path: ${path.config} / modules.d/*.yml # Set to true to enable config reloading reload.enabled: false # Period on which files under path should be checked for changes # reload.period: 10s # = Elasticsearch template setting = = # = General = = # The name of the shipper that publishes the network data. It can be used to group # all the transactions sent by a single shipper in the web interface. Name: filebeat222 # The tags of the shipper are included in their own field with each # transaction published. # tags: ["service-X", "web-tier"] # Optional fields that you can specify to add additional information to the # output. # fields: # env: staging # cloud.auth: # = = Outputs = = #-Elasticsearch output-- output.elasticsearch: # Array of hosts to connect to. Hosts: ["192.168.110.130 or 9200", "92.168.110.131 either 9200"] # http` (default) or `https`. # protocol: "https" # Authentication credentials-either API key or username/password. # api_key: "id:api_key" username: "elastic" password: "${ES_PWD}" # set password through keystore

. / filebeat-e # launch Filebeat

Looking at the Elasticsearch cluster, there is a default index name filebeat-% {[beat.version]} -% {+ yyyy.MM.dd}

Filebeat module

Official website: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules.html

Here, I use Elasticsearch mode to parse ES's slow log query. The steps are as follows, and the same is true for other modules:

Premise: install Elasticsearch and Kibana software, and then use Filebeat.

The specific operation official website is: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-modules-quickstart.html

The first step is to configure the filebeat.yml file:

# = = Kibana = # Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API. # This requires a Kibana endpoint configuration. Setup.kibana: # Kibana Host # Scheme and port can be left out and will be set to the default (http and 5601) # In case you specify and additional path, the scheme is required: http://localhost:5601/path # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601 host: "192.168.110.130 setup.kibana 5601" # specify kibana username: "elastic" # user password: "${ES_PWD}" # password Keystore is used here to prevent the plaintext password # Kibana Space ID # ID of the Kibana Space into which the dashboards should be loaded. By default, # the Default Space will be used. # space.id: # = = Outputs = # Configure what output to use when sending the data collected by the beat. #-Elasticsearch output-output.elasticsearch: # Array of hosts to connect to. Hosts: ["192.168.110.130 or 9200", "192.168.110.131 either 9200"] # http` (default) or `https`. # protocol: "https" # Authentication credentials-either API key or username/password. # api_key: "id:api_key" username: "elastic" # es user password: "${ES_PWD}" # es password # index cannot be specified here because I do not have a configuration template and will automatically generate an index named filebeat-% {[beat.version]} -% {+ yyyy.MM.dd}

The second step is to configure the slow log path of Elasticsearch:

Cd filebeat-7.7.0-linux-x86_64/modules.d

Vim elasticsearch.yml:

Step 3, take effect the ES module:

. / filebeat modules elasticsearch

View the modules in effect:

. / filebeat modules list

Step 4, initialize the environment:

. / filebeat setup-e

Step 5: start Filebeat:

. / filebeat-e

Check the Elasticsearch cluster. As shown in the following figure, all the logs of slow log queries are automatically parsed:

At this point, the module of Elasticsearch is a success.

This is the end of the content of "the use of Filebeat, a sharp tool for log collection". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.