Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Comparison between Logstach and flume of good programmer big data's Learning Route

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Logstach and flume comparison of good programmer big data's learning route, there is no concept of cluster, logstach and flume are both called groups.

Logstash is developed in JRuby language

Comparison of components:

Logstach: input filter output

Flume: source channel sink

Comparison of advantages and disadvantages:

Logstach:

Easy to install and small in size

There are filter components, which make the tool have the functions of data filtering and data segmentation.

Can be seamlessly integrated with ES

With the function of data fault tolerance, if downtime or disconnection occurs during data acquisition, the breakpoint will be continued (the offset of the read will be recorded)

In summary, the tool is mainly used to collect log data.

Flume:

High availability aspects are more powerful than logstach

Flume has always emphasized the security of data, and flume is controlled by transactions in the process of data transmission.

Flume can be used in the field of multi-type data transmission.

Data docking

Upload and decompress the logstach.gz file.

You can create conf files in the logstach directory to store configuration files

One command starts.

1.bin/logstash-e 'input {stdin {}} output {stdout {}'

Stdin/stdout (standard input / output stream)

Hello xixi

2018-09-12T21:58:58.649Z hadoop01 hello xixi

Hello

2018-09-12T21:59:19.487Z hadoop01 hello

2.bin/logstash-e 'input {stdin {} output {stdout {codec = > rubydebug}}'

Hello xixi

{

"message" = > "hello xixi"

"@ version" = > "1"

"@ timestamp" = > "2018-09-12T22:00:49.612Z"

"host" = > "hadoop01"

}

In the 3.es cluster, you need to start the es cluster

Bin/logstash-e 'input {stdin {} output {elasticsearch {hosts = > ["192.168.88.81 output 9200"]} stdout {}'

After entering the command, es automatically generates index and automatically mapping.

Hello

2018-09-12T22:13:05.361Z hadoop01 hehello

Bin/logstash-e'input {stdin {} output {elasticsearch {hosts = > ["192.168.88.81 output 9200", "192.168.88.82 output 9200"]} stdout {}'

In 4.kafka cluster, start kafka cluster

Bin/logstash-e'input {stdin {} output {elasticsearch {hosts = > ["192.168.88.81 output 9200", "192.168.88.82 output 9200"]} stdout {}'

Two configuration files start

Need to start zookeeper cluster, kafka cluster, es cluster

1. Docking with kafka data

Vi logstash-kafka.conf

Start up

Bin/logstash-f logstash-kafka.conf (- f: specify file)

Start the kafka consumption command on another node

Input {

File {

Path = > "/ root/data/test.log"

Discover_interval = > 5

Start_position = > "beginning"

}

}

Output {

Kafka {

Topic_id = > "test1"

Codec = > plain {

Format = > "% {message}"

Charset = > "UTF-8"

}

Bootstrap_servers = > "node01:9092,node02:9092,node03:9092"

}

}

two。 Docking with kafka-es data

Vi logstash-es.conf

# start logstash

Bin/logstash-f logstash-es.conf

Start the kafka consumption command on another node

Input {

File {

Type = > "gamelog"

Path = > "/ log/*/*.log"

Discover_interval = > 10

Start_position = > "beginning"

}

}

Output {

Elasticsearch {

Index = > "gamelog-% {+ YYYY.MM.dd}"

Hosts = > ["node01:9200", "node02:9200", "node03:9200"]

}

}

Data docking process

Logstach node storage: which node puts more free resources into which node (flexible storage)

1. Start logstach to monitor the logserver directory and collect the data to kafka

two。 Start another logstach, monitor some topic data in kafka, and collect it to elasticsearch

Data docking case

You need to start two logstach and call each configuration file for docking.

1. Collect data to kafka

Cd conf

Create a profile: vi gs-kafka.conf

Input {

File {

Codec = > plain {

Charset = > "GB2312"

}

Path = > "/ root/basedir/*/*.txt"

Discover_interval = > 5

Start_position = > "beginning"

}

}

Output {

Kafka {

Topic_id = > "gamelogs"

Codec = > plain {

Format = > "% {message}"

Charset = > "GB2312"

}

Bootstrap_servers = > "node01:9092,node02:9092,node03:9092"

}

}

Create a topic corresponding to kafka

Bin/kafka-topics.sh-create-zookeeper hadoop01:2181-replication-factor 1-partitions 1-topic gamelogs

two。 Start logstach on hadoop01

Bin/logstash-f conf/gs-kafka.conf

3. Start another logstach on hadoop02

Cd logstach/conf

Vi kafka-es.conf

Input {

Kafka {

Type = > "accesslogs"

Codec = > "plain"

Auto_offset_reset = > "smallest"

Group_id = > "elas1"

Topic_id = > "accesslogs"

Zk_connect = > "node01:2181,node02:2181,node03:2181"

}

Kafka {

Type = > "gamelogs"

Auto_offset_reset = > "smallest"

Codec = > "plain"

Group_id = > "elas2"

Topic_id = > "gamelogs"

Zk_connect = > "node01:2181,node02:2181,node03:2181"

}

}

Filter {

If [type] = = "accesslogs" {

Json {

Source = > "message"

Remove_field = > ["message"]

Target = > "access"

}

}

If [type] = = "gamelogs" {

Mutate {

Split = > {"message" = > ""}

Add_field = > {

"event_type" = > "% {message [3]}"

"current_map" = > "% {message [4]}"

"current_X" = > "% {message [5]}"

"current_y" = > "% {message [6]}"

"user" = > "% {message [7]}"

"item" = > "% {message [8]}"

"item_id" = > "% {message [9]}"

"current_time" = > "% {message [12]}"

}

Remove_field = > ["message"]

}

}

}

Output {

If [type] = = "accesslogs" {

Elasticsearch {

Index = > "accesslogs"

Codec = > "json"

Hosts = > ["node01:9200", "node02:9200", "node03:9200"]

}

}

If [type] = = "gamelogs" {

Elasticsearch {

Index = > "gamelogs1"

Codec = > plain {

Charset = > "UTF-16BE"

}

Hosts = > ["node01:9200", "node02:9200", "node03:9200"]

}

}

}

Bin/logstash-f conf/kafka-es.conf

4. Modify any data in the basedir file to generate the index file of es

5. Web page data is stored in the set / data/esdata

6. Find the specified field in the web page

The default word splitter is term, which can only find a single Chinese character, while query_string can find all Chinese characters.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report