What are the basic concepts of Elasticsearch and solr 07/19 Update SLTechnology News&Howtos

What are the basic concepts of Elasticsearch and solr

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the basic concepts of Elasticsearch and solr". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the basic concepts of Elasticsearch and solr".

I. installation

Be careful to ensure that the environment variable JAVA_HOME is set correctly.

After installing Java, you can follow the official documentation to install Elastic. It is easy to download the package directly.

$wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip$ unzip elasticsearch-5.5.1.zip$ cd elasticsearch-5.5.1/

Next, go to the unzipped directory and run the following command to start Elastic.

$. / bin/elasticsearch

If the error "max virtual memory areas vm.max_map_count [65530] is too low" is reported at this time, run the following command.

$sudo sysctl-w vm.max_map_count=262144

If all goes well, Elastic will run on the default port 9200. At this point, open another command line window, request the port, and you will get instructions.

$curl localhost:9200 {"name": "atntrTf", "cluster_name": "elasticsearch", "cluster_uuid": "tf9250XhQ6ee4h7YI11anA", "version": {"number": "5.5.1", "build_hash": "19c13d0", "build_date": "2017-07-18T20:44:24.823Z", "build_snapshot": false "lucene_version": "6.6.0"}, "tagline": "You Know, for Search"}

In the above code, port 9200 is requested, and Elastic returns a JSON object containing the current node, cluster, version, and other information.

Press Ctrl + C _ force Elastic and it will stop running.

By default, Elastic only allows local access, and if you need remote access, you can modify the config/elasticsearch.yml file in the Elastic installation directory, uncomment network.host, change its value to 0.0.0.0, and then restart Elastic.

Network.host: 0.0.0.0

In the above code, set to 0.0.0.0 so that anyone can access it. Online services should not be set up in this way, but should be set to a specific IP.

Basic Concepts 2.1Node and Cluster

Elastic is essentially a distributed database that allows multiple servers to work together, and each server can run multiple Elastic instances.

A single Elastic instance is called a node. A group of nodes forms a cluster.

2.2 Index

Elastic indexes all fields and writes a reverse index (Inverted Index) after processing. When looking for data, look for the index directly.

Therefore, the top-level unit of Elastic data management is called Index. It is synonymous with a single database. The name of each Index (that is, the database) must be lowercase.

The following command can view all the Index of the current node.

$curl-X GET 'http://localhost:9200/_cat/indices?v'2.3 Document

A single record in Index is called Document (document). Many Document make up an Index.

Document is represented in JSON format, and here is an example.

{"user": "Zhang San", "title": "engineer", "desc": "Database Management"}

The Document in the same Index does not require the same structure (scheme), but it is better to keep it the same, which helps to improve the search efficiency.

2.4 Type

Document can be grouped, for example, in the Index of weather, it can be grouped by city (Beijing and Shanghai) or by climate (sunny and rainy). This grouping is called Type, which is a virtual logical grouping used to filter Document.

Different Type should have similar structures (schema). For example, the id field cannot be a string in this group and a numeric value in another group. This is a difference from tables in a relational database. Data that is completely different in nature (such as products and logs) should be stored as two Index, rather than two Type in the same Index (although it can be done).

The following command lists the Type contained in each Index.

$curl 'localhost:9200/_mapping?pretty=true'

According to the plan, allowing only one Type,7.x version per Index for Elastic version 6.x will completely remove Type.

III. Create and delete Index

If you create a new Index, you can send a PUT request directly to the Elastic server. The following example is a new Index called weather.

$curl-X PUT 'localhost:9200/weather'

The server returns a JSON object with an acknowledged field indicating that the operation was successful.

{"acknowledged": true, "shards_acknowledged": true}

Then we issue a DELETE request to delete the Index.

$curl-X DELETE 'localhost:9200/weather' IV. Chinese word Segmentation setting

First of all, install the Chinese word segmentation plug-in. Ik is used here, and other plug-ins (such as smartcn) can also be considered.

. / bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.1/elasticsearch-analysis-ik-5.5.1.zip

The above code installs version 5.5.1 of the plug-in, which works with Elastic 5.5.1.

Then, restart Elastic, and the newly installed plug-in will be loaded automatically.

Then, create a new Index and specify the fields where the participle is required. This step varies depending on the data structure, and the following command is for this article only. Basically, all the Chinese fields that need to be searched should be set separately.

$curl-X PUT 'localhost:9200/accounts'-d' {"mappings": {"properties": {"user": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}, "title": {"type": "text" "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}, "desc": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}

In the above code, first create a new Index named accounts with a Type named person. Person has three fields.

User

Title

Desc

These three fields are all Chinese, and the type is text (text), so you need to specify a Chinese word splitter, you cannot use the default English word divider.

The word splitter of Elastic is called analyzer. We assign a word splitter to each field.

"user": {"type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word"}

In the above code, analyzer is the word splitter for field text, and search_analyzer is the word splitter for searching words. The ik_max_word word splitter is provided by the plug-in ik, which can perform the maximum number of word segmentation on the text.

Fifth, data operation 5.1 add new records

By sending a PUT request to the specified / Index/Type, you can add a new record to the Index. For example, if you send a request to / accounts/person, you can add a new personnel record.

$curl-X PUT 'localhost:9200/accounts/person/1'-d' {"user": "Zhang San", "title": "engineer", "desc": "Database Management"}'

The JSON object returned by the server will give information such as Index, Type, Id, Version and so on.

{"_ index": "accounts", "_ type": "person", "_ id": "1", "_ version": 1, "result": "created", "_ shards": {"total": 2, "successful": 1, "failed": 0}, "created": true}

If you look closely, you will see that the request path is / accounts/person/1, and the last 1 is the Id of the record. It doesn't have to be a number, but any string (such as abc) is fine.

When you add a new record, you can also not specify Id, and then change it to a POST request.

$curl-X POST 'localhost:9200/accounts/person'-d' {"user": "Li Si", "title": "engineer", "desc": "system Management"}'

In the above code, a POST request is made to / accounts/person to add a record. At this point, the _ id field is a random string in the JSON object returned by the server.

{"_ index": "accounts", "_ type": "person", "_ id": "AV3qGfrC6jMbsbXb6k1p", "_ version": 1, "result": "created", "_ shards": {"total": 2, "successful": 1, "failed": 0}, "created": true}

Note that if you do not create Index first (this example is accounts), execute the above command directly, Elastic will not report an error, but will directly generate the specified Index. So, when typing, be careful not to miswrite the name of Index.

5.2 View record

You can view this record by issuing an GET request to / Index/Type/Id.

$curl 'localhost:9200/accounts/person/1?pretty=true'

The above code requests to view the record / accounts/person/1, and the parameter pretty=true of URL is returned in an easy-to-read format.

In the returned data, the found field indicates that the query was successful, and the _ source field returns the original record.

{"_ index": "accounts", "_ type": "person", "_ id": "1", "_ version": 1, "found": true, "_ source": {"user": "Zhang San", "title": "engineer", "desc": "Database Management"}}

If the Id is incorrect, the data cannot be found, and the found field is false.

$curl 'localhost:9200/weather/beijing/abc?pretty=true' {"_ index": "accounts", "_ type": "person", "_ id": "abc", "found": false} 5.3 delete records

To delete a record is to issue a DELETE request.

$curl-X DELETE 'localhost:9200/accounts/person/1'

Don't delete this record here, it will be used later.

5.4 Update record

To update the record is to resend the data using a PUT request.

$curl-X PUT 'localhost:9200/accounts/person/1'-d' {"user": "Zhang San", "title": "engineer", "desc": "Database Management" Software development "{" _ index ":" accounts "," _ type ":" person "," _ id ":" 1 "," _ version ": 2," result ":" updated "," _ shards ": {" total ": 2," successful ": 1," failed ": 0}," created ": false}

In the above code, we change the original data from "database management" to "database management, software development". In the returned result, several fields have changed.

"_ version": 2, "result": "updated", "created": false

As you can see, the Id of the record has not changed, but the version (version) has changed from 1 to 2, and the operation type (result) has changed from created to updated,created field to false, because this is not a new record.

VI. Data query 6.1 returns all records

Using the GET method, request / Index/Type/_search directly, and all records are returned.

$curl 'localhost:9200/accounts/person/_search' {"took": 2, "timed_out": false, "_ shards": {"total": 5, "successful": 5, "failed": 0}, "hits": {"total": 2, "max_score": 1.0, "hits": [{"_ index": "accounts", "_ type": "person" "_ id": "AV3qGfrC6jMbsbXb6k1p", "_ score": 1.0, "_ source": {"user": "Li Si", "title": "engineer", "desc": "system Management"}}, {"_ index": "accounts", "_ type": "person" "_ id": "1", "_ score": 1.0, "_ source": {"user": "Zhang San", "title": "engineer", "desc": "Database Management" Software development "}]}

In the above code, the took field that returns the result indicates the time spent for the operation (in milliseconds), the timed_out field indicates whether to time out, and the hits field represents the hit record. The meaning of the inner sub-field is as follows.

Total: returns the number of records. In this example, there are 2 records.

Max_score: the highest degree of match, which in this case is 1. 0.

Hits: an array of returned records.

In the returned records, each record has a _ score field indicating the matching program, which is sorted in descending order by default.

6.2 full-text search

Elastic's query is very special, using its own query syntax, requiring the GET request to have a data body.

$curl 'localhost:9200/accounts/person/_search'-d' {"query": {"match": {"desc": "Software"}}'

The above code uses a Match query, and the specified match condition is that the desc field contains the word "software". The returned result is as follows.

{"took": 3, "timed_out": false, "_ shards": {"total": 5, "successful": 5, "failed": 0}, "hits": {"total": 1, "max_score": 0.28582606, "hits": [{"_ index": "accounts", "_ type": "person", "_ id": "1", "_ score": 0.28582606 "_ source": {"user": "Zhang San", "title": "engineer", "desc": "Database Management" Software development "}]}

Elastic returns 10 results at a time by default, which can be changed through the size field.

$curl 'localhost:9200/accounts/person/_search'-d' {"query": {"match": {"desc": "Administration"}, "size": 1}'

The above code specifies that only one result is returned at a time.

You can also specify the displacement through the from field.

$curl 'localhost:9200/accounts/person/_search'-d' {"query": {"match": {"desc": "Administration"}, "from": 1, "size": 1}'

The above code specifies that only one result is returned starting from position 1 (the default is from position 0).

6.3 logical operation

If there are multiple search keywords, Elastic considers them to be or relationships.

$curl 'localhost:9200/accounts/person/_search'-d' {"query": {"match": {"desc": "Software system"}}'

The above code searches for the software or system.

If you want to perform an and search with multiple keywords, you must use a Boolean query.

$curl 'localhost:9200/accounts/person/_search'-d' {"query": {"bool": {"must": [{"match": {"desc": "Software"}}, {"match": {"desc": "system"} 'VII. Reference link

ElasticSearch official manual

A Practical Introduction to Elasticsearch

(end)

I. Preface

When developing a website / App project, you usually need to build a search service. For example, news applications need to retrieve headlines / content, and community applications need to retrieve users / posts.

For simple requirements, you can use the LIKE fuzzy search of the database, for example:

SELECT * FROM news WHERE title LIKE'% Ferrari sports car%'

You can find all the news headlines that contain the keywords "Ferrari", but this approach has obvious drawbacks:

1. The performance of fuzzy query is very low. When there is a large amount of data, the database service is often interrupted.

2. Unable to query the relevant data, we can only strictly match keywords in the title.

Therefore, it is necessary to build a special service to provide search function, with advanced functions such as word segmentation, full-text retrieval and so on. Solr is such a search engine that allows you to quickly build search services suitable for your business.

II. Installation

Go to the official website http://lucene.apache.org/solr/ to download the installation package, extract it and enter the Solr directory:

Wget 'http://apache.website-solution.net/lucene/solr/6.2.0/solr-6.2.0.tgz'

Tar xvf solr-6.2.0.tgz

Cd solr-6.2.0

The directory structure is as follows:

Solr 6.2 directory structure

Before starting the Solr service, verify that Java 1.8 is installed:

View Java version

Start the Solr service:

. / bin/solr start-m 1g

Solr will listen on port 8983 by default, where-m 1g specifies that 1 G of memory is allocated to JVM.

Access the Solr management background in the browser:

Http://127.0.0.1:8983/solr/#/

Solr Management backend

Create a Solr application:

. / bin/solr create-c my_news

You can generate a my_news folder under the solr-6.2.0/server/solr directory with the following structure:

My_news directory structure

At the same time, you can see my_news in the administrative background:

Manage the backend

Third, create an index

We will import data from the MySQL database into Solr and index it.

First, you need to understand two concepts in Solr: field (field) and field type (fieldType). The configuration example is as follows:

Schema.xml example

Field specifies the name of a field, whether it is indexed / stored, and the field type.

FieldType specifies the name of a field type and the participle plug-ins that may be used when querying / indexing.

Rename the default configuration file managed-schema in the solr-6.2.0\ server\ solr\ my_news\ conf directory to schema.xml and add a new fieldType:

Word segmentation type

Create a lib directory under the my_news directory, and add the word segmentation plug-in ik-analyzer-solr5-5.x.jar to the lib directory. The structure is as follows:

My_news directory structure

Restart the service under the Solr installation directory:

. / bin/solr restart

You can see the newly added types in the administrative background:

Text_ik Typ

Next, create field:title and content corresponding to our database fields, with the type selected as text_ik:

Create a new field title

The MySQL database table structure of the data to be imported:

Edit the conf/solrconfig.xml file and add the class library and database configuration:

Class library

Dataimport config

At the same time, create a new database connection configuration file conf/db-mysql-config.xml, which contains the following contents:

Database configuration file

Put the database connection component mysql-connector-java-5.1.39-bin.jar in the lib directory, restart Solr, access the management backend, and perform full import of data:

Import data in full

Create a scheduled update script:

Update the script regularly

Add to the scheduled task and update the index incrementally every 5 minutes:

Scheduled task

Test the search results in the Solr management background:

Word segmentation search results

At this point, the basic search engine has been built, and external applications only need to provide query parameters through http protocol to obtain search results.

IV. Search intervention

It usually requires manual intervention in search results, such as editing recommendations, bidding rankings, or blocking search results. Solr already has a built-in QueryElevationComponent plug-in that can get the list of interventions corresponding to search keywords from the configuration file and rank the intervention results at the top of the search results.

In the solrconfig.xml file, you can see:

Interfere with its request configuration

The search component elevator is defined and applied to the search request of / elevate. The configuration file of the intervention result is in elevate.xml in the same directory as solrconfig.xml. Intervention configuration example:

Restart Solr, and when searching for "keywords", documents with id 1 and 4 will appear first, and documents with id = 3 will be excluded from the results. You can see that when there is no intervention, the search results are as follows:

No intervention result

When there is a search intervention:

Intervention result

It is simple to intervene the search results through the configuration file, but each update requires a restart of Solr to take effect. If it is a bit troublesome, we can imitate the QueryElevationComponent class and develop our own intervention components, such as reading the intervention configuration from Redis.

V. Chinese word segmentation

The search quality in Chinese is closely related to the effect of word segmentation. You can test word segmentation in the Solr management background:

Word segmentation result test

The above example shows the test results of the word segmentation of "University of Science and Technology Beijing" using the IKAnalyzer word segmentation plug-in. When users search for keywords such as "Beijing", "University of Science and Technology", "University of Science and Technology", "Science and Technology" and "University", they will search for documents containing "University of Science and Technology of Beijing".

The commonly used Chinese word segmentation plug-ins are IKAnalyzer, mmseg4j and Solr's own smartcn, and the effect of word segmentation has its own advantages and disadvantages. You can choose which one to choose according to your own business scenario and test the effect again.

Word segmentation plug-ins generally have their own default thesaurus and extended thesaurus, which contains the vast majority of commonly used Chinese words. If the default thesaurus does not meet your needs, such as vocabulary from certain areas of expertise, you can manually add it to the extended thesaurus so that the word segmentation plug-in can identify new words.

Example of word Segmentation extension Thesaurus configuration

The word segmentation plug-in can also specify to stop the thesaurus and remove some meaningless words from the segmentation result, such as "of", "hum", etc., for example:

Get rid of meaningless words

VI. Summary

The above describes some of the most commonly used features of Solr, and Solr itself has many other rich features, such as distributed deployment.

I hope it will be helpful to you.

VII. Appendix

1. Reference materials:

Https://wiki.apache.org/solr/

Http://lucene.apache.org/solr/quickstart.html

Https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

2. All configuration files and Jar packages used in the above Demo:

Https://github.com/Ceelog/OpenSchool/blob/master/my_news.zip

3. Are there any questions? Contact the author Weibo / Wechat @ Ceelog

Search engine selection and arrangement: Elasticsearch vs Solr

This article was first posted on my blog.

Original link: a comparison between Elasticsearch and Solr

Introduction to Elasticsearch *

Elasticsearch is a real-time distributed search and analysis engine. It can help you process large-scale data at an unprecedented speed.

It can be used for full-text search, structured search and analysis, and of course you can combine the three.

Elasticsearch is a search engine based on full-text search engine Apache Lucene (TM). It can be said that Lucene is the most advanced and efficient full-function open source search engine framework.

But Lucene is just a framework, and to make full use of its functionality, you need to use JAVA and integrate Lucene into the program. It takes a lot of learning to understand how it works, and Lucene is really complicated.

Elasticsearch uses Lucene as its internal engine, but when using it for full-text search, you only need to use a uniformly developed API, without knowing how the complex Lucene behind it works.

Of course, Elasticsearch is not just Lucene, it not only includes full-text search function, but also can do the following:

Distributed real-time file storage and each field is indexed so that it can be searched.

Distributed search engine for real-time analysis.

It can scale to hundreds of servers to handle structured or unstructured data at the PB level.

With so many features integrated into one server, you can easily communicate with ES's RESTful API through the client or any programming language you like.

Getting started with Elasticsearch is very simple. It comes with a lot of very reasonable default values, which allows beginners to avoid having to face complex theories as soon as they start.

When it is installed, it can be used, and it can become very productive with a very small learning cost.

As you learn more and more, you can also take advantage of more advanced features of Elasticsearch, and the entire engine can be configured very flexibly. You can customize your own Elasticsearch according to your own needs.

Use case:

Wikipedia uses Elasticsearch to do full-text search and highlight keywords, as well as provide search-as-you-type, did-you-mean and other search advice functions.

The Guardian uses Elasticsearch to process visitor logs so that editors can be fed back on public responses to different articles in real time.

StackOverflow combines full-text search with geographic location and related information to provide a presentation of more-like-this-related issues.

GitHub uses Elasticsearch to retrieve more than 130 billion lines of code.

Every day, Goldman Sachs uses it to index 5TB data, and many investment banks use it to analyze changes in the stock market.

But Elasticsearch is not just for large enterprises, it also helps many startups like DataDog and Klout extend their functions.

Advantages and disadvantages of Elasticsearch * *: advantages

Elasticsearch is distributed. No other components are needed, and the distribution is real-time and is called "Push replication".

Elasticsearch fully supports Apache Lucene's near real-time search.

No special configuration is required to handle multi-tenancy (multitenancy), while Solr requires more advanced settings.

Elasticsearch uses the concept of Gateway, which makes it easier to back up.

Each node forms a peer-to-peer network structure, and when some nodes fail, they will automatically assign other nodes to work instead of them.

Shortcoming

There is only one developer (the current Elasticsearch GitHub organization already has more than that, it already has quite active maintainers).

Not automatic enough (not suitable for the current new Index Warmup API)

Introduction to Solr *

Solr (pronounced "solar") is the open source enterprise search platform for the Apache Lucene project. Its main functions include full-text retrieval, hit marking, faceted search, dynamic clustering, database integration, and rich text (such as Word, PDF) processing. Solr is highly extensible and provides distributed search and index replication. Solr is the most popular enterprise search engine, and Solr4 also adds NoSQL support.

Solr is a stand-alone full-text search server written in Java and running in Servlet containers such as Apache Tomcat or Jetty. Solr uses full-text indexing and search with Lucene Java search base as the core, and has API similar to REST's HTTP/XML and JSON. Solr's powerful external configuration allows it to be adapted to many types of applications without the need for Java coding. Solr has a plug-in architecture to support more advanced customizations.

As a result of the merger of the Apache Lucene and Apache Solr projects in 2010, the two projects were produced and implemented by the same Apache Software Foundation development team. When it comes to technology or product, Lucene/Solr or Solr/Lucene is the same.

Advantages and disadvantages of Solr

Solr has a larger and more mature community of users, developers, and contributors.

Support to add indexes in a variety of formats, such as HTML, PDF, Microsoft Office software formats and JSON, XML, CSV and other plain text formats.

Solr is mature and stable.

It is faster to search without considering indexing.

Shortcoming

When the index is established, the search efficiency decreases, and the real-time index search efficiency is not high.

Comparison between Elasticsearch and Solr

Solr is faster when simply searching existing data.

Cdn.xitu.io/2016/12/30/d5944021d5ad35ab6c62e4e56ae21e22.png?imageView2/0/w/1280/h/960/format/webp/ignore-error/1 "> Search Fesh Index While Idle

When the index is built in real time, Solr will produce io blocking, query performance is poor, Elasticsearch has obvious advantages.

Search_fresh_index_while_indexing

As the amount of data increases, the search efficiency of Solr becomes lower, but there is no significant change in Elasticsearch.

Search_fresh_index_while_indexing

To sum up, the architecture of Solr is not suitable for real-time search applications.

Actual production environment testing *

The figure below shows that the average query speed has been improved by 50 times since the search engine was transferred from Solr to Elasticsearch.

Average_execution_time

Comparison and Summary of Elasticsearch and Solr

Both are easy to install.

Solr uses Zookeeper for distributed management, while Elasticsearch itself has distributed coordination management function.

Solr supports more formats of data, while Elasticsearch only supports json file format

Solr officially provides more functions, while Elasticsearch itself pays more attention to core functions, and advanced functions are provided by third-party plug-ins.

Solr performs better than Elasticsearch in traditional search applications, but its efficiency is significantly lower than that of Elasticsearch in dealing with real-time search applications.

Solr is a powerful solution for traditional search applications, but Elasticsearch is more suitable for emerging real-time search applications.

Other Lucene-based open source search engine solutions *

Use Lucene directly

Description: Lucene is a JAVA search class library, which itself is not a complete solution and requires additional development work.

Advantages: mature solutions, there are many successful cases. Apache top-level projects are continuing to make rapid progress. A large and active development community, a large number of developers. It is just a class library with enough room for customization and optimization: after simple customization, it can meet most common needs; after optimization, it can support 1 billion + search.

Cons: additional development work is required. All the extensions, distributions, reliability, etc. need to be implemented on their own; non-real-time, there is a time delay between indexing and searching, while the scalability of the current "near real-time" (Lucene Near Real Time search) search scheme needs to be further improved.

Katta

Description: based on Lucene, support distributed, scalable, fault-tolerant, quasi-real-time search scheme.

Advantages: it can be used out of the box and can be distributed with Hadoop. It has extension and fault tolerance mechanism.

Disadvantages: just search the scheme, the indexing part still needs to be implemented by yourself. In the search function, only the most basic requirements are realized. There are fewer successful cases, and the maturity of the project is slightly lower. Because of the need to support distributed, for some complex query requirements, customization will be more difficult.

Hadoop contrib/index

Description: Map/Reduce mode, distributed indexing scheme, can be used with Katta.

Advantages: distributed indexing and expansibility.

Disadvantages: only indexing scheme, not including search implementation. Working in batch mode, support for real-time search is not good.

LinkedIn's open source solution

Description: a series of solutions based on Lucene, including quasi-real-time search zoie, facet search implementation bobo, machine learning algorithm decomposer, abstract repository krati, database schema wrapper sensei, etc.

Pros: validated solutions that support distributed, scalable, rich feature implementations

Disadvantages: the relationship with linkedin is too close and the customizability is poor.

Lucandra

Description: based on Lucene, the index is stored in cassandra database

Advantages: refer to the advantages of cassandra

Disadvantages: refer to the shortcomings of cassandra. In addition, this is just a demo without a lot of verification.

HBasene

Description: based on Lucene, the index is stored in HBase database

Advantages: refer to the advantages of HBase

Disadvantages: refer to the shortcomings of HBase. In addition, in the implementation, the lucene terms is stored as a row, but the posting lists corresponding to each term is stored as a column. As the posting lists of a single term increases, the query speed will be greatly affected.

Thank you for your reading, the above is the content of "what are the basic concepts of Elasticsearch and solr". After the study of this article, I believe you have a deeper understanding of what the basic concepts of Elasticsearch and solr are, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.