In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what are the competing technologies of Elasticsearch?" The explanation in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian and go deep into it slowly to study and learn "what are the competing technologies of Elasticsearch?"
competitive products
Elasticsearch started as a search engine and now focuses on big data analysis. It has gradually evolved into an all-round data product. Among Elasticsearch's many excellent functions, there are more and more cross-competition with many data products. Some functions are very distinctive, and some functions are only incidental. Understanding these product characteristics helps to better apply them to business needs.
Image: Elasticsearch competition map schematic diagram
1、Lucene
Lucene is a core library for search, Elastic is also built on top of Lucene, and the competitive relationship between them is determined by Lucene itself.
In the era of Internet 2.0, the simplest technical requirement to test Internet companies is to see how their search is done. At that time, everyone's approach is almost the same, building a search engine based on Lucene core library, and the rest depends on the level of developers of each company. The author was fortunate enough to have done a vertical search engine based on Lucene before 2012. It is necessary to say a lot about many problems encountered:
The project is based on Lucene packaging, business code and core library are built together, code coupling is very high, every time there is a data field change, it needs to be recompiled and packaged for release, this process is very cumbersome and quite dangerous.
Program re-release, the need to close the original program, involving process switching issues.
The index data is regenerated regularly and completely, and it also involves problems such as switching between old and new indexes and updating indexes in real time, all of which need to design a set of complex program mechanisms to ensure
Each independent line of business needs to build a Lucene index process separately. After there are many lines of business, management is a troublesome matter.
When a single Lucene index data exceeds the single instance limit, it needs to be distributed. This original Lucene has no way, so the conventional practice is to split into multiple index processes according to a specific classification. When the client queries, it brings a specific classification, and the backend routes to the specific index according to the specific classification.
Lucene library itself is difficult to control, for the development engineer is still weak, there are too many factors to consider, a little careless, there will be a lot of program problems.
Illustration: Lucene internal index construction and query process
Elasticsearch competes with Lucene core libraries in the following ways:
Perfect package Lucene core library, design friendly Restful-API, developers do not need to pay too much attention to the underlying mechanism, directly out of the box.
Fragmentation and replica mechanism directly solves the performance and high availability problems under cluster.
Elastic's rapid development in recent years, the market has rarely found Lucene-based search engine projects, almost all choose Elasticsearch as the basic database service, due to its open source characteristics, the majority of cloud vendors are also customized development on this basis, deep integration with their own cloud platform, but also did not develop a branch alone.
Elasticsearch won the competition.
2、Solr
Solr is the first full-featured search engine product based on Lucene core library, born far earlier than Elasticsearch, early in the field of full-text search, Solr has a very large advantage, almost completely overwhelming Elastic, in recent years, Elastic due to its distributed characteristics, to meet a lot of big data processing needs, especially the popularity of the concept of ELK, almost completely forgotten the existence of Solr, although Solr-Coud distributed products have also been launched, but there is basically no advantage.
I have contacted several data companies. Full-text search is based on Solr, and it is a single-node model. Occasionally, some problems occur. Ask consultants to investigate the problems. It is difficult to find personnel. Later, they are migrated to Elasticsearch.
Elasticsearch is used by almost all companies on the market, except for some old systems based on Solr, and all new system projects should be Elasticsearch.
Personally, I think there are several reasons:
ES is friendlier and simpler than Solr, with lower barriers to entry.
ES is richer in features than Solr products, such as fragmentation mechanism and data analysis ability.
ES ecological development, Elastic-stack the entire technology stack is quite complete, and it is easy to integrate with various data systems.
ES community development is more active and Solr has few dedicated technical analysis conferences.
Diagram: Solr product function module internal architecture diagram
Elasticsearch won the competition.
3、RDBMS
The main advantage of relational databases over Elasticarch is that transaction isolation mechanisms are irreplaceable, but their limitations are obvious, as follows:
Relational database query performance, the amount of data more than millions of tens of millions of levels after a severe decline, the essence of the index algorithm efficiency is not good, B+ tree algorithm is not as efficient as inverted index algorithm.
Relational database index leftmost principle restrictions, query condition fields can not be arbitrarily combined, otherwise the index invalid, on the contrary Elasticserach can be arbitrarily combined, this scenario is particularly obvious in the data table association query, Elasticsearch can be solved by large wide table, but relational database can not.
Relational database is divided into databases and tables, and it is difficult to implement multi-condition query. Elasticsearch is naturally distributed design, and multiple indexes and multiple fragments can be jointly queried.
Relational database aggregation performance is low, the data volume is a little more, the query column cardinality is a little more, the performance drops quickly, Elasticsearch uses column-based storage on aggregation, and the efficiency is extremely high.
Relational database focuses on equilibrium, Elasticsearch focuses on specific query speed.
If the data does not need strict transaction mechanism isolation, I think Elasticsearch can be used instead. If the data needs both transaction isolation and query performance, it can be implemented by mixing DB and ES.
Illustration: Schematic diagram of RDBMS and ES advantages
4、OpenTSDB
OpenTSDB is internally implemented based on HBase and belongs to time series database. It mainly optimizes and processes data structure for data with time characteristics and requirements, so as to be suitable for storing data with time characteristics, such as monitoring data and temperature change data. Xiaomi's open-source monitoring system open-falcon is based on OpenTSDB.
Diagram: OpenTSDB time series database internal implementation
Elastic product itself has no intention of time series in this field. With the popularity of ELK, many companies use ELK to build monitoring system. Although it has not been specially processed like time series database in numerical type, we also accept this fact due to its convenient use and the advantages of ecological technology stack.
Elasticsearch is easy to build time series, and its performance is quite good:
Index creation rules, you can create indexes by year, month, week, day, hour, etc., very convenient.
In terms of data filling, customize a time field to distinguish sorting, and the rest of the fields are not required.
In terms of data query, in addition to the actual sequence query, there can be more search conditions.
Elasticsearch is a better choice unless you have very demanding monitoring requirements for time-series data.
5、HBase
HBase is a representative of columnar databases, and several fatal designs inside it greatly limit its scope of application:
Access to HBase data can only be based on Rowkey, Rowkey design directly determines the use of HBase.
It does not support secondary indexing itself, and to implement it, a third party needs to be introduced.
Not much about its various technical principles, talk about some of its use.
The company belongs to the logistics express industry, a vehicle-related project, recording all vehicle trajectories, vehicle-mounted equipment will regularly report the vehicle trajectory information, back-end data storage based on HBase, the data volume is more than tens of TB, because the business end needs to calculate its kilometer fuel consumption and related costs according to the vehicle trajectory information, so it is necessary to query the data in batches according to the query conditions. The query conditions include some non-rowkey fields, such as time range, ticket number, city number, etc. This is almost impossible to achieve, the original violence has been done, performance problems are worrying. The problem of this project is that rowkey is difficult to design to meet the requirements of query conditions, followed by secondary index problems, which have many query conditions.
If you only use Rowkey to access the scene, Elastic can also be used, as long as the design of_id, and HBase can achieve the same effect.
If you need to introduce three-party components to query a columnar database, it is better to build it directly on Elasticsearch.
Elasticsearch is more generic and applicable to business requirements scenarios unless there are very stringent requirements for using a columnar database.
Diagram: Schematic diagram of internal data structure of column-type database
6、MongoDB
MongoDB is the representative of document-based database, the data model is based on Bson, while Elasticsearch's document data model is Json, Bson is essentially an extension of Json, can be directly converted to each other, and their data schema is freely extensible, basically unlimited. MongoDB itself competes with relational databases and supports strict transaction isolation mechanism. In fact, it is different from Elasticsearch product positioning at this level, but in actual work, almost no company will put core business data on MongoDB, and relational databases are still the first choice. Beyond that, Elasticsearh has the following advantages over MongoDB:
Document query performance, inverted index/KDB-Tree is better than B+Tree.
ES itself provides column data doc_value, which is much faster than Mongo's row data.
Cluster fragmentation replica mechanism, ES architecture design is better.
ES features more features than MongoDB and is applicable to a wider range of scenarios.
Sample document data, ObjectId automatically generated by MongoDB built-in.
The company happened to have a project. The original data layer was built based on MongoDB design, and there were many query problems. Later, it was successfully migrated to Elasticsearch platform. The data volume of the server was reduced from 15 to 3, and the query performance was greatly improved by ten times.
Without data transaction isolation, Elasticsearch can replace MongoDB completely.
7、ClickHouse
ClickHouse is an MPP query analysis database, which has been very active in recent years, and many head companies have introduced it. Why do we want to introduce it? The reasons may be different from other head companies, as follows:
The author has been engaged in big data work for a long time, and often encounters real-time query requirements for data aggregation. In the early days, we will choose a relational database to do aggregate queries, such as MySQL/PostgreSQL. If we don't pay attention, it is easy to have performance bottlenecks.
Elasticsearch products are introduced later, which are based on column design and fragmentation architecture, and their performance is indeed significantly better than single-node relational databases in all aspects.
Elasticsearch also has obvious limitations. One is that when the data volume exceeds ten million or hundred million, if the number of aggregated columns is too many, the performance will reach the bottleneck; the other is that it does not support deep secondary aggregation, resulting in some complex aggregation requirements, which need to be manually written for external implementation, which in turn increases a lot of development workload.
ClickHouse is introduced later to replace Elasticserach for deep aggregation requirements. Its performance is good. It performs well in tens of millions of data, and its resource consumption is much lower than before. The same server resources can bear more business requirements.
ClickHouse and Elasticsearch are the same, both use column storage structure, both support copy fragmentation, the difference is that ClickHouse has some unique implementations at the bottom, as follows:
MergeTree table engine, provides data partitioning, primary index, secondary index.
Vector Engine, data is not only stored in columns, but also processed in vectors (part of a column), which makes more efficient use of CPU.
Illustration: ClickHouse's position in the Big Data Platform
8、Druid
Durid is a big data MPP query type data product, the core function Rollup, all the original data that needs Rollup must have time series fields. Elasticsearch launched this feature after version 6.3.X. At this time, the two products form a competitive relationship. Whoever is higher depends on the application scenario requirements.
Druid sample data must have a time field.
The author was responsible for all Elasticsearch technology stack related data projects of the company before. At that time, there were also some requirements for real-time aggregation queries to return part of the data, but our requirements were different. The index data belonged to offline update, and all index insertion data would be deleted and recreated every day. At this time, the version of Elastic was 6.8.X, and only offline data Rollup was supported, so this function was useless. Elastic only launched real-time Rollup function after version 7.2.X.
Druid is more focused, product design revolves around Rollup, Elastic is only incidental;
Druid supports a variety of external data, directly docking Kafka data stream, can also directly docking platform own internal data; Elastic only supports internal index data, external data needs to be imported into the index with the help of three-party tools;
Druid discards the original data after the data is rolled up;Elastic generates new index data after the original index base is rolled up;
Druid and Elastic have very similar technical architectures, both supporting separation of node responsibilities and horizontal expansion;
Druid and Elastic both support inverted indexing on the data model, based on which search and filtering are possible.
Diagram: Druid Product Technical Architecture System Diagram
With regard to Rollup, a big data analytics domain, individuals prefer Druid if there is a need for large-scale Rollup scenarios.
Thank you for your reading. The above is the content of "What are the competing technologies of Elasticsearch?" After studying this article, I believe you have a deeper understanding of what the competing technologies of Elasticsearch are. The specific use situation still needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.