In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article will explain in detail what the characteristics of ElasticSearch7 are, and the content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
1. Enable adaptive default replica selection
First of all, I would like to introduce what is the adaptive default replica selection, that is, ElasticSearch will record the past request response time and execution time of each data node, as well as the thread pool queue size on the node to determine which node's metrics are good, and later such requests will fall to this node.
In Elasticsearch 6 and previous versions, a series of search requests for the same shard will be forwarded to the main shard and each copy in a circular manner. If a node happens to start garbage collection (GC) for a long time, this may cause problems, and search requests may still be forwarded to such slow-response nodes, which can have an impact on search latency.
In version 6.1, Elasticsearch officially added an experimental feature for adaptive replica selection. Each node tracks and compares the time spent on search requests with other nodes, and uses this data to adjust the frequency of requests sent to shards on a particular node. In the benchmark test on the official website, this will significantly improve the search throughput, which officials say reduces latency by 99%.
This configuration is disabled by default throughout the 6.x version. But then officials got feedback from users, and both users and officials thought this setting was very useful, so officials turned it on by default in Elasticsearch 7.0.0.
PUT / _ cluster/settings {"transient": {"cluster.routing.use_adaptive_replica_selection": true}}
If this feature is turned off, the search operation will be sent to all index shards in a circular manner.
Skip shard refreshes if a shard is "search idle
two。 If the shard is searching for idle time, skip fragment refresh
Elasticsearch 6 and previous versions automatically refresh index data in the background by default. This is why Elasticsearch can get search results "in near real time", which by default can be obtained within a second after the search request is added. However, if you don't need a refresh (for example, if Elasticsearch doesn't have any query and search tasks), then this feature can have a significant impact on Elasticsearch performance.
By introducing this concept of search search idle, Elasticsearch 7 is considered search idle by default if the fragment does not have any search behavior within 30 seconds. When sharding is at this stage, all planned refresh tasks are skipped and the next refresh task is not triggered until you have a search problem. This will improve the throughput of Elasticsearch, and we all know that turning off automatic refresh for data writing or updating can be significantly faster when writing a large amount of data or rebuilding the index.
3. Default to a shard
Many companies are very messy when using Elasticsearch, most of which are excessive fragmentation, and the default value of the number of fragments may be the main reason for this phenomenon. In Elasticsearch 6 and previous versions, each index defaults to 5 shards. If you have a daily index for 10 different applications (an index built on a daily basis, similar to a daily table in a database), and each application has 5 shards by default, then 50 shards will be created per day, even if there are only a few gb index data per day, and thousands of shards will appear soon.
Index lifecycle management is the first step to help achieve this: it provides a local way to create indexes by size, rather than just by day, and provides built-in shrinking capabilities to reduce the number of fragments per index. The second is to reduce the default number of shards for the index. Of course, if we create the index, we can still set the number of fragments ourselves, if not, it will be the default value of 1. 5%.
4. Lucene 8
This is an important play, and I'm going to insert it here. Before I graduated from my senior year in 2015, I didn't have a very deep understanding of this indexing technology. I wanted to upgrade the Lucene version of a project. At that time, I was still using 3.x version, and it was a nightmare to upgrade to 4.x.
Like every version, officials want to support the latest major version of Lucene, as well as all the features it brings. The bottom layer of Elasticsearch 7.0uses Lucene 8. Lucene 8 has laid the foundation for many functional improvements of Elasticsearch, including improvements to the search performance of top-k queries (top-k, which can be used to find the maximum or minimum K values in massive data. I thought about a lot of methods, such as comparing the default minimum heap of the first 1000 values with all data; you can also split the data and use fast sorting.. ), and better combine the relevant signal identification to search while maintaining the speed to improve the performance.
5. Introduce the function of time-consuming minimum path in cross-cluster search
In Elasticsearch 5.3, the official introduction of cross-cluster search allows users to query across multiple clusters. Later, officials improved the function of cross-cluster search, added some features, and eventually used it to deprecate and replace tribe nodes (tribe nodes, tribe nodes are federated client nodes across multiple clusters, usually used to retrieve information from multiple Elasticsearch clusters to make it look like a composite cluster. For more information, you can take a look at the official cross-cluster search function, according to the official documentation. The tribe node was discarded in version 7. 0. To implement federated queries
In Elasticsearch 7.0, officials have added a new execution mode for cross-cluster search: reducing round-trip time when not necessary. This mode (ccs_minimize_roundtrips) can avoid high latency in cross-cluster search queries (for example, cross-network) and get search results faster.
6. New implementation of Cluster Coordination
Elasticsearch is easy to expand from the beginning of the design, and this design can well deal with catastrophic failures. In order to better support such requirements, officials have created a pluggable cluster coordination system, whose default implementation is called Zen Discovery (roughly meaning that this function wants to be implemented simply, and you users can rest assured to use it, this function is a cluster discovery mechanism introduced officially). The sharp increase in the use of Elasticsearch has exposed many problems, for example, the minimum_master_nodes setting of Zen is often misconfigured. This makes the cluster more prone to brain cracking and data loss. Cross-cluster (the cluster is dynamic and very large) because it is difficult to maintain this setting.
In Elasticsearch 7. 0, the cluster coordination layer was redesigned. The new implementation provides secure subsecond master election time, while Zen may take a few seconds to select a new master, which is valuable for critical deployment. After removing the minimum_master_nodes setting, the growth and contraction of the cluster becomes safer and easier, reducing the number of pitfalls due to system misconfiguration. Most importantly, the new cluster coordination layer provides strong robustness for the future of Elasticsearch, ensuring that more advanced functional extensions can be designed for this module in the future.
7. Better support for small heaps (real memory circuit breakers)
Elasticsearch uses a new circuit breaker to track the total memory used by JVM. If the request results in more than 95% of the reserved memory and actual heap usage, the request is rejected. Officials will also change the default maximum number of barrels, search.max_buckets, to 10000, which is unlimited in versions 6 and earlier. These two changes in Elasticsearch 7 can prevent cluster failures caused by large queries and aggregation operations when rookies run large queries for the first time.
8. Cross-cluster replication can already be used in production environments
Officials have introduced a beta function in Elasticsearch 6.5. cross-cluster replication. Now in 7.0 this function is mature and can be used in production environment. Previous versions of cross-cluster replication required that replication can only begin on a new index and cannot replicate an existing index. Cross-cluster replication can now start replicating existing indexes with soft deletion enabled in 6.7 and 7.0, and the new index is enabled by default. Some very new technologies have also been introduced to prevent follower from lagging far behind leader (without specifying what the new technology is), and officials have deliberately added a management UI to Kibana to configure remote clusters.
9. Index lifecycle management can already be used in production environments
Index Lifecycle Management (ILM) is a beta feature in Elasticsearch 6.6. ILM has now been officially migrated from Beta to GA! ILM makes it easier to manage the life cycle of data in Elasticsearch, including the progress of the data between hot,warm,cold,delete phases. You can use the API in Elasticsearch or the beautiful UI in Kibana to create a situation about how the data moves through these phases.
In Elasticsearch 6. 7 and 7. 0, ILM can now manage frozen indexes. Frozen indexes are valuable for long-term data storage in Elasticsearch and require less storage (heap) than the amount of data managed by the node. In 6.7and 7.0, frozen indexes can now be frozen as part of the cold phase in ILM. In addition, ILM can now be used directly with cross-cluster replication (CCR), and CCR uses GA in both versions of Elasticsearch 6.7 and 7. 0. ILM is free to use and is part of the default distribution of Elasticsearch.
10. SQL can already be used in production environment
Elasticsearch's SQL interface is now GA. The SQL interface was introduced as an Alpha distribution in 6.3. those who are familiar with SQL can directly use Elasticsearch to query the data they need. it also allows easy access to data in Elasticsearch using SQL's BI tools. Not only that, but it can also be accessed using JDBC and ODBC drivers, which means there are currently four official ways to access it: Elasticsearch SQL, through Elasticsearch REST endpoints, Elasticsearch SQL interfaces, JDBC, and ODBC.
11. Advanced REST client has perfect function
Officials have been preparing for this feature for a long time: creating a next-generation Java client for accessing Elasticsearch clusters. To interrupt here, in fact, I have always liked to use rest to access es, which is really simple and convenient.
twelve。 Nanosecond timestamp is supported
Prior to 7. 0, Elasticsearch could only store timestamps in milliseconds. If you are dealing with situations with a high incidence, for example, if you want to store and analyze tracked network packet data in Elasticsearch, you may need higher time accuracy. From past experience, we have used Joda time libraries to process dates and times, while Joda lacks support for such high-precision timestamps. In JDK 8, the official Java time API was introduced, which can also handle nanosecond precise timestamps, and over the past year, officials have been trying to migrate Joda time usage to native Java time while trying to maintain backward compatibility. Starting with version 7.0.0, these nanosecond timestamps can be used through a dedicated date_nanos field mapper. Note that the aggregation of this field is still at millisecond resolution to avoid bucket explosions (for fear that the word will be harmonized).
13. Faster retrieval of hot content
In terms of search, query performance is a key function. For setting the lower limit of returned data for the number of results without the need for an accurate number of hits the search performance in Elasticsearch 7.0has been significantly improved. For example, if users usually view only the first page of results on a site, regardless of the number of matching documents, you can display them "more than 10000 results" and then provide them with paging results. It is common for users to enter frequent words (such as "the" and "a") in a query, and from past experience, Elasticsearch rates these frequent entries even though they may not make much sense. In this case, Elasticsearch can now skip words that have a high score but don't make much sense. This can greatly improve the query speed. The actual number of actual results with the highest score is configurable, but the default value is 10000. The behavior of queries with result sets less than this threshold does not change, that is, the number of results is accurate, but for queries with a small number of matching results, the performance is not improved. Because the improvement is based on skipping records with low rankings, it does not apply to aggregation operations.
14. Support for TLS 1.3
Elasticsearch has always supported encrypted communications, and officials have recently begun to support JDK11 (for an introduction to JDK11, see my previous blog, https://my.oschina.net/110NotFound/blog/3046749), which provides a lot of new features. JDK 11 now supports TLSv1.3, so TLSv1.3 support is provided for users running JDK 11 in Elasticsearch starting with 7. 0. To help new users avoid inadvertently running with low security, TLSv1. 0 (the default) is removed For users running older versions of Java, the default options with TLSv1.2 and TLSv1.1 are officially available.
15. Bind JDK to Elasticsearch
Officials say the most obvious entry barrier for many users is that they don't know that Elasticsearch is a Java application (this can be a big problem if the user community is not developers familiar with java, and then again, developers themselves should not be limited to a certain language). In 7. 0, Elasticsearch bundles an OpenJDK distribution to help users get started with Elasticsearch faster. Users are also allowed to configure JDK on their own. If you want to use the JDK of your own machine environment, you can still do this by setting up JAVA_HOME before starting Elasticsearch.
16. Rank features
Elasticsearch 7.0has several new field types that can make the most of data. The two core use case methods that can help search are rank_feature and rank_features. They can be used to enhance documents based on numeric or classified values.
The sorting feature of rank_feature, which is negatively related to scores, should set positive_score_impact to false (default is true). The rank_feature query will use it to modify the scoring formula so that the score decreases rather than increases as the property value increases. For example, in web search, url length is a commonly used feature that is negatively related to scores.
PUT my_index {"mappings": {"properties": {"pagerank": {"type": "rank_feature"}, "url_length": {"type": "rank_feature", "positive_score_impact": false} PUT my_index/_doc/1 {"pagerank": 8 "url_length": 22} GET my_index/_search {"query": {"rank_feature": {"field": "pagerank"}
The rank_features field can be indexed in the numeric feature vector so that rank_feature queries can be used later to enhance the documents in the query. It is similar to the rank_feature data type, but is more suitable for situations where the list of features is sparse, so it is unlikely to add a field to each mapping.
PUT my_index {"mappings": {"topics": {"type": "rank_features"} PUT my_index/_doc/1 {"topics": {"politics": 20, "economics": 50.8}} PUT my_index/_doc/2 {"topics": {"politics": 5.2 "sports": 80.1}} GET my_index/_search {"query": {"rank_feature": {"field": "topics.politics"} 17. JSON log
In addition to plain text logging, JSON logging is officially enabled in Elasticsearch. Starting with 7.0, you can find new files with the .json extension in the log directory. This means that you can now use filtering tools such as jq to print and process logs beautifully in a more structured way.
18. Script score query (also known as function score 2.0)
In 7.0, there is a next-generation feature scoring feature. This new script_score query provides a new, simpler and more flexible way to generate ranking scores for each record. Script_score query consists of a set of functions, including arithmetic and distance functions, which users can match arbitrarily to construct arbitrary functions to calculate scores. The modular structure is easier to use, so this important function is open to users.
This feature is experimental and may be completely changed or deleted in future versions.
What are the characteristics of ElasticSearch7 to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.