Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Splunk VS elasticsearch

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

This article makes an omni-directional comparison between ElasticSearch and Splunk in terms of architecture, function, product line, concept and so on. I hope you can be helpful in formulating big data search plan.

Brief introduction

ElasticSearch (1) (2) is an open source search service based on Lucene. It provides a full-text search engine with distributed multi-user capability, based on RESTful web interface. Elasticsearch is developed in Java and released as open source under the Apache license terms, and is currently a popular enterprise search engine. Designed for cloud computing, can achieve real-time search, stable, reliable, fast, easy to install and use.

ELK is the abbreviation of ElasticSearch,Logstash,Kibana, which provides search, data access and visualization functions respectively, and constitutes the application stack of Elastic.

Splunk is the first company in big data's field to be listed on Nasdaq. Splunk provides a machine data engine. Use Splunk to collect, index, and leverage fast mobile computer data generated by all applications, servers, and devices (physical, virtual, and cloud). Search and analyze all real-time and historical data from one location. Using Splunk to process computer data allows you to resolve problems and investigate security incidents in minutes rather than hours or days. Monitor your end-to-end infrastructure to avoid service performance degradation or disruption. Meet compliance requirements at a lower cost. Correlate and analyze complex events across multiple systems. Gain new levels of operational visibility as well as IT and business intelligence.

According to the latest database engine rankings, Elastic,Solr and Splunk respectively occupy the top three database search engines.

From the trend point of view, Elastic and Splunk rose significantly, and Elastic showed a very strong momentum.

Basic concept

Elastic

Quasi real-time (NRT)

Elasticsearch is a quasi-real-time search platform, there is a certain time delay from data index to data can be searched.

Index (Index)

An index is a collection of documents with common characteristics. An index has its own name and can be searched, updated, deleted, and so on.

Type (Type)

Each index can contain one or more types, and the type can be regarded as a logical grouping of index data, and we usually define documents with the same fields as the same type.

Document (Document)

A document is the basic unit of indexed information. Documents in Elastic are represented as JSON objects, the documents are physically stored in the index, and need to be typed. Because it appears as JSON, naturally, the document is made up of fields (Feilds), each of which is a name-value pair (Name Value Pair).

Score (score)

Elastic is built on Lucene, so there is a score for the search results. To evaluate the correlation between search results and queries.

The following figure is an example of an Elastic search seen in Kibana, where the raw data is a simple log file:

After we index to Elasticsearch through logstash, we can search.

Splunk

Real-time performance

Splunk is also quasi-real-time, and Splunk's real-time search (Realtime Search) can provide an uninterrupted data stream of search results.

Event (Event)

Corresponding to Elastic documents, the basic unit of Splunk's data index is events, each of which contains a set of values, fields, and timestamps. The event of a Splunk can be a piece of text, a configuration file, a log, or a JSON object.

Field (Fields)

Fields are name-value pairs that can be searched, and different events may have different fields. Splunk supports field extraction (fields extraction) when indexing (index time) and searching (search time).

Index (Indexes)

Similar to the Elastic index, all events are physically stored on the index, which can be understood as a database table.

Knowledge object (Knowledge Object)

Splunk's knowledge object provides further data interpretation, classification, enhancement and other functions, including: field (fields), field extraction (fields extraction), event type (event type), transaction (transaction), search (lookups), label (tags), alias (aliases), data model (data model) and so on.

The following figure shows the search results of a Splunk search for the same log data as the previous example on the Splunk client.

In terms of basic concepts, Elasticsearch and Splunk are basically the same. From the examples we can see a lot of commonalities, events / documents, timestamps, fields, searches, timeline diagrams, and so on. There are several major differences:

Elastic does not support field extraction during search, that is, all fields in Elastic documents are fixed at the time of indexing, while Splunk supports dynamic extraction of new fields during search.

Elastic's search is based on a scoring mechanism, and the search results have a score, while Splunk does not rate the search results.

The knowledge object of Splunk can provide more advanced and flexible management ability to the data.

User interface

ElasticSearch provides REST API to carry out

Cluster management, monitoring, health check

Index management (CURD)

Search execution, including sorting, paging, filtering, scripting, aggregation and other advanced search functions.

Elasticsearch itself does not provide any UI features, search can use Kibana, but not managing UI is still unpleasant, the advantage of open source is that there will be a lot of developers to build missing features:

ElasticHQ

Cerebro (recommended, clean interface, I like it)

Dejavu

Another option is to install X-Pack, which is free of charge.

As an enterprise software, Splunk has rich management and access interfaces. In addition to REST API and command line interface, the UI of Splunk is very friendly and easy to use, and almost all functions can be used through integrated UI. The following interfaces are also provided

REST API

Splunk UI

CLI

Functional data access and acquisition

The Elastic stack uses Logstash and Beats for data digestion and acquisition.

Logstash is implemented in jruby, a bit like a data pipeline that processes, transforms, filters, and then outputs the input data to other places. Logstash has designed its own DSL, including regions, comments, data types (Boolean, string, numeric, array, hash), conditional judgment, field references, etc.

Logstash's data pipeline consists of three steps, Input,Filter and Output, each of which can be extended through plugin. In addition, Input and Output also support the configuration of Codecs to complete the codec of input and output data.

Common Input supported by Logstash includes File,syslog,beats and so on. Filter mainly completes the data deformation processing, you can add, delete and modify fields, add tags, and so on. As an open source software, Output not only supports ElasticSearch, but also can integrate and target with many other software. Output can be file, graphite, database, Nagios,S3,Hadoop and so on.

In practice, the logstash process is divided into two different roles. Running on the application server, try to reduce the running pressure, only do read and forward, this role is called shipper; running on an independent server, complete data parsing processing, responsible for writing the role of Elasticsearch, called indexer.

As a stateless software, logstash can be easily extended linearly with message queuing system.

Beats is a data collector system developed by Elastic from packetbeat. The beat collector can be written directly to Elasticsearch or transferred to Logstash. The abstract libbeat provides a unified data transmission method, input configuration parsing, logging framework and other functions.

The open source community has contributed many kinds of beats.

Because Beats is written in Golang, it is very efficient.

Splunk uses Farwarder and Add-ons for data digestion and acquisition.

Splunk has built-in processing of input such as files, syslog, network ports, etc. When configuring a node as Forwarder, Splunk Forwarder can be used as a data channel to send data to the configured indexer. At this point, it is similar to logstash. A major difference here is the extraction of data fields. Elastic must be done through filter configuration or extension in logstash, that is, what we call Index time extraction, which cannot be changed after extraction. Splunk supports Index time extraction, but more often, Splunk does not extract in index time but waits until the search decides how to extract fields.

For domain-specific data acquisition, Splunk is in the form of Add-on. There are more than 600 different kinds of Add-on on Splunk's App market.

Users can obtain specific data through a specific Add-on or through their own development of Add-on.

For big data's data collection, you can also refer to my other blog.

Data management and storage

The data storage model of ElasticSearch comes from Lucene, and the basic principle is to use an inverted table. You can refer to this article.

The core of Splunk is also inverted list. I recommend you to read this introduction on last year's Splunk Conf, Behind the Magnifying Glass: How Search Works

The Event of Splunk exists in many Buckets, and the indexes of multiple Buckets forming logical groups are distributed on the Indexer.

In each Bucket, the structure of an inverted table stores data, and the original data is compressed by gzip.

When searching, use Bloom filter to locate the bucket where the data is located.

In the storage management of data, both Elastic and Splunk make use of inverted tables. Splunk compresses the data, so it takes up much less storage space, especially considering that most of the data is text and the compression ratio is very high, of course, part of the performance will be lost for data decompression.

Data analysis and processing

For data processing and analysis, ElasticSearch is mainly realized by Search API. On the other hand, Splunk provides a very powerful SPL, which is much easier to use than ES's Search API,Splunk SPL. It can be said that SPL is the SQL of unstructured data. Whether using SPL to develop analytical applications or directly using SPL to process data on Splunk UI, SPL is very easy to use. The open source community is also trying to add SPL-like DSL to Elastic to improve the ease of data processing. For example:

Https://github.com/chenryn/ESPL

From this feedback, we can see that ES's search still has many shortcomings.

In response, Elastic launched painless script, which is still in the experimental stage.

Data presentation and visualization

Kibana is an open source analysis and visualization platform for Elasticsearch to search and view data stored interactively in Elasticsearch indexes. Using Kibana, you can use a variety of charts for advanced data analysis and presentation.

Splunk integrates very convenient data visualization and dashboard functions. For the results of SPL, it is very convenient to visually analyze and export to the dashboard through the simple settings of UI.

The comparison of the following figure is from https://www.itcentralstation.com/products/comparisons/kibana_vs_splunk

Splunk only lags behind Tableau in the ranking of data visualization.

Expansibility

From the perspective of scalability, both platforms have very good scalability.

As an open source stack, Elastic stack can be easily extended by Plugin. These include:

ElasticSearch Plugin

Kibana Plugin

Logstash Plugin

Beats Platform

Splunk provides a series of extension points to support the development of applications and Add-on, and more information and documentation can be found at http://dev.splunk.com/. These include:

Web Framework

SDK

Modular Input

... ...

Compared with Elastic's Plugin,Splunk extension concept is more complex, the development of an App or Add-on barriers are relatively high. As a data platform, Splunk should improve its extensibility to make it easier and easier to extend.

Architecture

Elastic Stack

As shown in the figure above, ELK is a stack, Logstash provides data digestion and acquisition, Elasticsearch stores, indexes and searches data, and Kibana provides data visualization and reporting functions.

Splunk

The architecture of Splunk has three main roles:

Indexer

Indexer provides data storage, indexing, similar to the role of Elasticsearch

Search Head

Search Head is responsible for search, customer access, from a functional point of view, part is Kibana, because Splunk's UI is running on Search Head, providing all the client and visualization functions, and another part is to provide distributed search functions, including the distribution of search to Indexer and the merging of search results, this part of the function corresponds to Elasticsearch.

Forwarder

Splunk's Forwarder is responsible for data access, similar to Logstash

In addition to the above three main roles, the architecture of Splunk also includes: Deployment Server,License Server,Master Cluster Node,Deployer and so on.

The basic architecture of Splunk and ELK is very similar, but the architecture of ELK is simpler and clearer. Logstash is responsible for data access, Kibana is responsible for data presentation, and all the complexity is in Elasticsearch. The architecture of Splunk is more complex and there are more types of roles.

If you install the stand-alone version, Splunk is easier, because all the features are installed at once, while ELK must install E/L/K separately, from this point of view, Splunk has certain advantages.

Distributed clustering and scalability

ElasticSearch

ElasticSearch is designed for distributed and has good scalability. In a typical distributed configuration, each node can be configured into different roles, as shown in the figure above:

Client Node, the node responsible for API and data access, does not store / process data

Data Node, responsible for data storage and indexing

Master Node, the management node, is responsible for the coordination of the nodes in the Cluster, and does not store data.

Each role can be configured through ElasticSearch's configuration file or environment variables. Each role can be easily Scale, because Elastic uses a peer-to-peer design, that is, all roles are equal (Master Node Leader Election, one of which is the leader) makes it very scalable in a clustered environment, especially in a container environment, such as Docker Swarm or Kubernetes.

Reference:

Https://elk-docker.readthedocs.io/#elasticsearch-cluster

Https://github.com/pires/kubernetes-elasticsearch-cluster

Splunk

As an enterprise-level platform for distributed machine data, Splunk has a powerful distributed configuration, including cluster configuration across data centers. Splunk provides two kinds of clusters, Indexer cluster and Search Head cluster.

Splunk Indexer cluster

As shown in the figure above, Splunk's indexer cluster consists of three main roles:

Master Node,Master Node is responsible for managing and coordinating the entire cluster, similar to ES's Master. However, there is only one node, and multiple Master is not supported (the latest version 6.6). Master Node is in charge.

Coordinate data replication between Peer Node

Tell Search Head where the data is.

Configuration Management of Peer Node

Fault recovery in case of Peer Node failure

Peer Nodes, responsible for data indexing, Data Node,Peer Node similar to ES is responsible for

Store index data

Send / receive replicated data to other Peer nodes

Respond to search requests

Search Head, responsible for data search and client-side API access, is similar to ES's Client Node, but not exactly the same. Search Head is responsible for sending search requests to Peer Nodes and merging the search results.

Some people may ask, is Master a single point of failure in the cluster? What if Master node goes down? Splunk's answer is no. Even if the Master node fails, the Peer Nodes will still work unless there is a Peer Node failure at the same time.

Http://docs.splunk.com/Documentation/Splunk/6.6.1/Indexer/Whathappenswhenamasternodegoesdown

Https://answers.splunk.com/answers/129446/why-does-master-node-continue-to-be-single-point-of-failure-in-clustering.html

Splunk Search Header cluster

A Search Head cluster is made up of a set of Search Head that share configuration, search tasks, and so on. The Cluster has the following main roles:

Deployer, responsible for distribution status and application to peers

Cluster Member, one of which is Captain, is responsible for coordination. Cluster Memeber communicate with each other to ensure that the state is consistent. Load Balancer is optional and can be responsible for Search access.

Search Peers, the Indexer Nodes responsible for data indexing

In addition, Splunk used to provide a feature called Search Head Pooling, but now it is Depecated.

Indexer clusters can be configured with Search Head clusters to form a distributed Splunk configuration.

Compared with the relatively simple cluster configuration of ES, the cluster configuration of Splunk is more complex. All nodes in ES can flexibly configure roles and can be relatively easy to expand. For example, the replication of Kubernetes's Pod can easily expand each role. It is relatively difficult to extend Splunk, and more complex configurations are needed to achieve dynamic scaling. You can refer to here. Configuring a Splunk cluster in a container environment requires a lot of layout. For example, in the configuration of this Master, users need to consider:

How to configure License

Change the default username password

Configure Search Head Cluster for each Search Head

Wait for the Splunk process to start successfully

Configure business discovery

Install the application

... ...

And the expansion of the cluster is difficult to directly use the expansion interface provided by the container orchestration platform, which Splunk still has a lot of room for improvement.

Product line

Elastic

Elastic's product line, in addition to the familiar ELK (ElasticSearch,Logstash,Kikana), mainly includes

Beats Beats is an open source component that provides an agent to transfer locally captured data to ElasticSearch

Elastic Cloud, cloud services provided by Elasti

X-Pack, an extension component of Elastic, provides security, alarm, monitoring, machine learning and graph processing capabilities. The main functions need to be paid for.

Splunk

Splunk's product line includes

Splunk Enterprise

Splunk Cloud, a cloud service operated by Splunk, runs on AWS

Splunk Light,Splunk Light version with streamlined features for small and medium-sized enterprises

Hunk, Splunk on Hadoop

Apps / Add-ons, Splunk provides a large number of applications and data acquisition extensions, you can refer to http://apps.splunk.com/

Splunk ITSI (IT Service Intelligence), a product specially developed by Splunk for IT operation and maintenance

Splunk ES (Enterprise Security), a product developed by Splunk for enterprise security, is a leading product of Splunk. It has been continuously rated as the leader in the field of SIEM by Gartner, challenging the traditional giant IBM,HP in the industry.

Splunk UBA (User Behavior Analytic), UBA is a machine learning-based security product brought by Caspidia acquired by Splunk in 15 years.

From the point of view of the product line, Splunk has its own leading products in the field of IT operation and security in addition to providing a basic platform. Elastic lacks applications in a certain area.

Price

Price is a factor of great concern to everyone.

The basic components of Elastic are open source. See the table below. Some of the advanced features in X-pack need to be used for a fee. Includes security, multi-cluster, reporting, monitoring, and so on.

The price of cloud service is shown in the figure below. The cloud of ES is charged according to the resources used. From the area selected here, you can see that the cloud of ES also runs on AWS. The configuration in the figure below costs about $200 a month. (fees vary from region to region)

At the same time, in addition to Elastic itself, there are many other companies that also provide Elastic Search cloud services, such as Bonsai,Qbox.io.

Splunk

Splunk Enterprise is paid on an annual or unlimited basis according to the daily data flow, which is $2700 per year or about $2700 per month for 1GB. If the amount of data per day is less than 500m, you can use the free License provided by Splunk, but you can't use security, distributed and other advanced features. 500m can do a lot of things.

The price of cloud service is much cheaper, 5GB per day, only 2430 yuan per year, less than 200 yuan per month. Of course, because the billing method is different, it is difficult to compare with Elastic cloud. In addition, because it is on AWS, Chinese users, hehe.

Summary

Big data's search platform has become standard for many enterprises, and Elastic stack and Splunk are the most excellent and popular choices. Both of them have their own advantages and worthy of improvement. I hope this article can be helpful in the selection of your big data platform. I also hope that everyone will come to communicate with me and grow up together.

Reference documentation

ELK

ElasticSearch reference documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

A list of ElasticSearch-related open source software collected on Github https://github.com/dzharii/awesome-elasticsearch

Special topic https://www.zhihu.com/topic/19899427/hot of Zhihu ElaticSearch

Chinese book https://github.com/chenryn/ELKstack-guide-cn

Chinese book https://www.gitbook.com/book/wizardforcel/mastering-elasticsearch/details

Splunk

Splunk document https://docs.splunk.com/Documentation

Splunk e-book https://www.splunk.com/web_assets/v5/book/Exploring_Splunk.pdf

Splunk development document http://dev.splunk.com/getstarted

Splunk application market http://apps.splunk.com/

Splunk Quick reference https://www.splunk.com/content/dam/splunk2/pdfs/solution-guides/splunk-quick-reference-guide.pdf

Other

Https://www.upguard.com/articles/splunk-vs-elk

Https://db-engines.com/en/system/Elasticsearch%3BSplunk

Https://www.searchtechnologies.com/blog/log-analytics-tools-open-source-vs-commercial

Http://www.learnsplunk.com/splunk-vs-elk-stack.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report