Construction practice of 100 million-level ELK Log platform 07/13 Update SLTechnology News&Howtos

Construction practice of 100 million-level ELK Log platform

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This article is mainly about the real experience in our work, how to build a 100 million-level log platform, and at the same time teach you to set up such a 100-million-level ELK system. For the specific development of the log platform, please refer to the previous article, "Evolution from ELK to EFK".

Needless to say, the old drivers are seated, and we are ready to start.

Overall architecture

The overall architecture is mainly divided into four modules, which provide different functions.

Filebeat: lightweight data collection engine. Based on the original Logstash-fowarder source code transformation. In other words: Filebeat is the new version of Logstash-fowarder, and it will also be ELK Stack's first choice in Agent.

Kafka: data buffer queue. As a message queue, it decouples the process and improves scalability. With peak processing capacity, the use of message queuing enables key components to withstand sudden access pressure without collapsing completely due to sudden overloaded requests.

Logstash: data collection and processing engine. Support dynamic collection of data from a variety of data sources, and filter, analyze, enrich, unify the format and other operations, and then store for subsequent use.

Elasticsearch: distributed search engine. It has the characteristics of high scalability, high reliability, easy management and so on. It can be used for full-text retrieval, structured retrieval and analysis, and can combine the three. Elasticsearch is developed based on Lucene and is now one of the most widely used open source search engines. Wikipedia, StackOverflow, Github and so on are all based on it to build their own search engines.

Kibana: visualization platform. It can search and display indexed data stored in Elasticsearch. It can be easily used to display and analyze data with charts, tables and maps.

Version description Filebeat: 6.2.4Kafka: 2.11-1Logstash: 6.2.4Elasticsearch: 6.2.4Kibana: 6.2.4 it is best to download the corresponding plug-in practice

Let's take the more common Nginx log as an example. The log content is in JSON format.

{"@ timestamp": "2017-12-27T16:38:17+08:00", "host": "192.168.56.11", "clientip": "192.168.56.11", "size": 26, "responsetime": 0.000, "upstreamtime": "-", "upstreamhost": "-", "http_host": "192.168.56.11", "url": "nginxweb/index.html", "domain": "192.168.56.11", "xff": "-" "referer": "-", "status": "200"} {" @ timestamp ":" 2017-12-27T16:38:17+08:00 "," host ":" 192.168.56.11 "," clientip ":" 192.168.56.11 "," size ": 26," responsetime ": 0.000," upstreamtime ":" upstreamhost ":"-"," http_host ":" 192.168.56.11 "," url ":" / nginxweb/index.html " "domain": "192.168.56.11", "xff": "-", "referer": "-", "status": "200"} {"@ timestamp": "2017-12-27T16:38:17+08:00", "host": "192.168.56.11", "clientip": "192.168.56.11", "size": 26, "responsetime": 0.000, "upstreamtime": "-", "upstreamhost": "-", "http_host": "192.168.56.11" "url": "/ nginxweb/index.html", "domain": "192.168.56.11", "xff": "-", "referer": "-", "status": "200"} {"@ timestamp": "2017-12-27T16:38:17+08:00", "host": "192.168.56.11", "clientip": "192.168.56.11", "size": 26, "responsetime": 0.000, "upstreamtime": "-", "upstreamhost": "-" "http_host": "192.168.56.11", "url": "/ nginxweb/index.html", "domain": "192.168.56.11", "xff": "-", "referer": "-", "status": "200" {"@ timestamp": "2017-12-27T16:38:17+08:00", "host": "192.168.56.11", "clientip": "192.168.56.11", "size": 26, "responsetime": 0.000 "upstreamtime": "-", "upstreamhost": "-", "http_host": "192.168.56.11", "url": "/ nginxweb/index.html", "domain": "192.168.56.11", "xff": "-", "referer": "-", "status": "200"} Filebeat

Why use Filebeat instead of the original Logstash?

The reason is very simple, resource consumption is relatively large.

Because Logstash runs on JVM and consumes a lot of resources, the author later wrote a lightweight Agent called Logstash-forwarder with less function but less resource consumption in GO.

Later, the author joined elastic.co, and the development of Logstash-forwarder was done by the company's internal GO team, which was finally named Filebeat.

Filebeat needs to be deployed on each application server, and the configuration can be pushed and installed through Salt.

Download $wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-darwin-x86_64.tar.gz to extract tar-zxvf filebeat-6.2.4-darwin-x86_64.tar.gzmv filebeat-6.2.4-darwin-x86_64 filebeatcd filebeat to modify the configuration

Modify Filebeat configuration to support collection of local directory logs and output logs to Kafka cluster

$vim fileat.ymlfilebeat.prospectors:- input_type: log paths:-/ opt/logs/server/nginx.log json.keys_under_root: true json.add_error_key: true json.message_key: logoutput.kafka: hosts: ["192.168.0.1 opt/logs/server/nginx.log json.keys_under_root 9092192.168.0.2"] topic: 'nginx'

Some configuration parameters have changed greatly after Filebeat 6.0. for example, document_type does not support it and needs to be replaced by fields and so on.

Start $. / filebeat-e-c filebeat.ymlKafka

It is recommended that the number of nodes in the Kafka cluster in the production environment is (2N + 1). Here we take three nodes as an example.

download

Download Kafka directly to the official website.

$wget http://mirror.bit.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz decompress tar-zxvf kafka_2.11-1.0.0.tgzmv kafka_2.11-1.0.0 kafkacd kafka modify Zookeeper configuration

Modify Zookeeper configuration and set up Zookeeper clusters with a number of (2N + 1)

ZK cluster suggests using Kafka to reduce the interference of network-related factors.

$vim zookeeper.propertiestickTime=2000dataDir=/opt/zookeeperclientPort=2181maxClientCnxns=50initLimit=10syncLimit=5server.1=192.168.0.1:2888:3888server.2=192.168.0.2:2888:3888server.3=192.168.0.3:2888:3888

Add a myid file under the Zookeeper data directory to represent the Zooekeeper node id (1, 2, 3), and guarantee that it will not be duplicated

$vim / opt/zookeeper/myid1 starts the Zookeeper node

Start three Zookeeper nodes respectively to ensure the high availability of the cluster

$. / zookeeper-server-start.sh-daemon. / config/zookeeper.properties modify Kafka configuration

There are 3 kafka clusters built on this side. You can modify the Kafka configuration one by one. You need to pay attention to the broker.id in each of them (1, 2, 3).

$vim. / config/server.propertiesbroker.id=1port=9092host.name=192.168.0.1num.replica.fetchers=1log.dirs=/opt/kafka_logsnum.partitions=3zookeeper.connect=192.168.0.1: 192.168.0.2: 192.168.0.3:2181zookeeper.connection.timeout.ms=6000zookeeper.sync.time.ms=2000num.io.threads=8num.network.threads=8queued.max.requests=16fetch.purgatory.purge.interval.requests=100producer.purgatory.purge.interval.requests=100delete.topic.enable=true starts the Kafka cluster

Start three Kafka nodes respectively to ensure the high availability of the cluster

$. / bin/kafka-server-start.sh-daemon. / config/server.properties

Check whether the topic is created successfully

$bin/kafka-topics.sh-- list-- zookeeper localhost:2181nginx monitors Kafka Manager

Kafka-manager is an open source cluster management tool of Yahoo.

Can be downloaded and installed on Github: https://github.com/yahoo/kafka-manager

If you encounter that Kafka consumption is not timely, you can add partition by going to the specific cluster page. Kafka increases the speed of concurrent consumption through partition partitioning.

Logstash

Logstash provides three major functions

INPUT enters FILTER filtering function and OUTPUT goes out.

If you use the Filter feature, it is highly recommended that you use Grok debugger to parse the log format in advance.

Download $wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gz, extract and rename $tar-zxvf logstash-6.2.4.tar.gz$ mv logstash-6.2.4 logstash to modify the Logstash configuration

Modify the Logstash configuration to provide the functions of indexer and insert data into the Elasticsearch cluster

$vim nginx.confinput {kafka {type = > "kafka" bootstrap_servers = > "192.168.0.1 output 2181192.168.0.2" topics = > "nginx" group_id = > "logstash" consumer_threads = > 2}} output {elasticsearch {host = > ["192.168.0.1", "192.168.0.2" "192.168.0.3"] port = > "9300" index = > "nginx-% {+ YYYY.MM.dd}"}} launch Logstash$. / bin/logstash-f nginx.confElasticsearch download $wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.tar.gz extract $tar-zxvf elasticsearch-6.2.4.tar.gz$ mv elasticsearch-6.2.4.tar.gz elasticsearch modify configuration $vim config/elasticsearch.ymlcluster. Name: es node.name: es-node1network.host: 192.168.0.1discovery.zen.ping.unicast.hosts: ["192.168.0.1"] discovery.zen.minimum_master_nodes: 1 start

Start in the background through-d

$. / bin/elasticsearch-d

Open the web page http://192.168.0.1:9200/, if the following message indicates that the configuration is successful

{name: "es-node1", cluster_name: "es", cluster_uuid: "XvoyA_NYTSSV8pJg0Xb23A", version: {number: "6.2.4", build_hash: "ccec39f", build_date: "2018-04-12T20:37:28.497551Z", build_snapshot: false, lucene_version: "7.2.1" Minimum_wire_compatibility_version: "5.6.0", minimum_index_compatibility_version: "5.0.0"}, tagline: "You Know, for Search"} console

The name Cerebro may seem strange to everyone, but it used to be called kopf! Because Elasticsearch 5.0 no longer supports site plugin, the author of kopf abandoned the original project and started cerebro to continue to support the management of Elasticsearch in the new version as an independent single-page application.

Pay attention to the separation of Master and Data nodes. When there are more than 3 Data nodes, it is recommended to separate responsibilities, reduce pressure, Data Node memory is not more than 32G, and it is recommended to set it to 31G. The specific reason can be seen in the previous article, discovery.zen.minimum_master_nodes is set to (total / 2 + 1), the most important point to avoid brain fissure, do not expose ES to the public network, it is recommended to install X-PACK. To enhance its security kibana download $wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-darwin-x86_64.tar.gz extract $tar-zxvf kibana-6.2.4-darwin-x86_64.tar.gz$ mv kibana-6.2.4-darwin-x86_64.tar.gz kibana modify configuration $vim config/kibana.ymlserver.port: 5601server.host: "192.168.0.1" elasticsearch.url: "http : / / 192.168.0.1 9200 "launch Kibana$ nohup. / bin/kibana & interface display

To create an index page, you need to specify it in Management-> Index Patterns by prefix.

Final effect display

Summary

To sum up, through the above deployment commands to achieve the whole set of components of ELK, including log collection, filtering, indexing and visualization of the whole process, based on this system to achieve the analysis of log functions. At the same time, real-time log processing at the average daily level of 100 million can be achieved by horizontally expanding Kafka and Elasticsearch clusters.

The first principle of all good architecture design is not to pursue advanced, but to be reasonable, to match the company's business scale and development trend. Any company, even if it seems to be a very large company, such as BAT, should have a simple and clear system architecture at the beginning.

However, with the continuous expansion of the scope of business, the scale of business continues to expand, and the system is gradually complex and large, so that all systems encounter high availability problems. So how can we avoid similar problems and build highly available systems?

For this reason, I specially wrote a column called "take you to High availability", which brings together many years of practical experience in architecture design in Baidu and Hujiang into this column.

This column contains a total of 15 articles, which are divided into three modules to explain in detail the knowledge of highly available architectures:

Concept part: introduces the theory and evolution of high availability architecture, which is more theoretical. But it is necessary for us to understand the whole system.

Engineering chapter: introduces how to do each layer of high availability in common Internet layering, including DNS, service layer, cache layer, data layer, etc.

Question section: introduce how to troubleshoot common faults on the line, including machine, application layer and other dimensional fault location

The column is updated every week for 64 days. In nearly 2 months, I will take you to fully understand all aspects of the high availability architecture. At the same time, I will throw out these problems and corresponding solutions. I hope you will not repeat the holes I have encountered. At the same time, we also look forward to asking interesting questions.

Column address: take you to play high availability

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.