In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "what are the latest version of memcache interview questions in 2021". The explanation in the article is simple and clear and easy to learn and understand. Please follow the editor's train of thought to study and learn "what are the latest version of memcache interview questions in 2021"?
1.1. Tuning in the design phase
(1) according to the business incremental requirements, create the index based on the date template, and scroll the index through roll over API.
(2) use aliases for index management
(3) force_merge the index at a fixed time in the early morning to free up space.
(4) adopt hot and cold separation mechanism to store hot data to SSD to improve retrieval efficiency, and shrink operation of cold data regularly to reduce storage.
(5) adopt curator to manage the life cycle of index.
(6) set up a word separator reasonably only for the fields that need word segmentation.
(7) the Mapping stage fully combines the attributes of each field, whether it needs to be retrieved, whether it needs to be stored, and so on.
1.2. Write tuning
(1) the number of copies before writing is set to 0
(2) disable refresh_interval and set it to-1 before writing, and disable refresh mechanism
(3) during the writing process: bulk batch writing is adopted.
(4) number of recovery copies and refresh interval after writing
(5) try to use the automatically generated id.
1.3. Query tuning
(1) disable wildcard
(2) disable batch terms (hundreds of scenarios)
(3) make full use of the inverted index mechanism to keyword the keyword type as much as possible.
(4) when the amount of data is large, the index can be determined based on time and then retrieved.
(5) set up a reasonable routing mechanism.
1.4. Other tuning
Deployment tuning, business tuning and so on. In part of the above mentioned, the interviewer will basically evaluate your previous practice or operation and maintenance experience.
2. What is the inverted index of elasticsearch
The data structure that lucene has used extensively since version 4 + is FST. FST has two advantages:
(1) the space occupation is small. Through the reuse of prefixes and suffixes in the dictionary, the storage space is compressed.
(2) the query speed is fast. Query time complexity of O (len (str)).
3. What to do if there is too much elasticsearch index data, how to tune and deploy
Interviewer: want to know the operation and maintenance ability of a large amount of data.
Answer: the planning of index data should be planned in advance. As the saying goes, "design comes first and coding comes later". Only in this way can we effectively avoid the impact on online customer retrieval or other businesses caused by the sudden surge of data that leads to insufficient processing capacity of the cluster.
How to tune, as mentioned in question 1, here is a little bit more detailed:
3.1 dynamic index level
3.2 Storage level
Hot and cold data are stored separately, hot data (such as the last 3 days or a week), and the rest are cold data.
For cold data will not be written to new data, you can consider regular force_merge plus shrink compression operation to save storage space and retrieval efficiency.
3.3 deployment level
Once there is no plan before, this is an emergency strategy.
Combined with ES's own feature of supporting dynamic expansion, the method of dynamically adding machines can relieve the pressure on the cluster. Note: if the previous master node and other plans are reasonable, you do not need to restart the cluster to complete the dynamic expansion.
4. How does elasticsearch realize the master election?
1GET / _ cat/nodes?v&h=ip,port,heapPercent,heapMax,id,name2ip port heapPercent heapMax id name5, describe in detail the process of indexing documents in Elasticsearch
6. Describe the process of Elasticsearch search in detail?
Interviewer: want to understand the underlying principles of ES search, no longer just focus on the business level.
Answer:
The search is broken down into two phases: "query then fetch".
The purpose of the query phase: locate the location, but not take it.
7. What are the optimization methods for setting Linux when Elasticsearch is deployed
8. What is the internal structure of lucence?
Interviewer: want to know the breadth and depth of your knowledge.
Answer:
Lucene is the two processes of indexing and searching, including index creation, indexing and searching. We can expand some based on this vein.
9. How does Elasticsearch achieve Master election?
10. Nodes in Elasticsearch (for example, a total of 20), of which 10 are
11. How does the client select a specific node to execute the request when connecting with the cluster?
TransportClient uses the transport module to remotely connect to an elasticsearch cluster. It does not join the cluster, but simply obtains one or more initialized transport addresses and communicates with them in a polling manner.
12. Describe in detail the process of indexing documents in Elasticsearch.
Elasticsearch is a distributed RESTful-style search and data analysis engine.
(1) query: Elasticsearch allows you to perform and merge multiple types of searches-structured, unstructured, geolocation, metrics-search methods vary according to your heart.
(2) Analysis: it is one thing to find the ten documents that best match the query. But if you are faced with a billion-line log, how to interpret it? Elasticsearch aggregation allows you to look at the big picture and explore trends and patterns of data.
(3) Speed: Elasticsearch is very fast. Really, really fast.
(4) scalability: it can be run on a laptop. It can also run on hundreds of servers that host PB-level data.
(5) resilience: Elasticsearch runs in a distributed environment, with this in mind from the beginning of the design.
(6) flexibility: multiple case scenarios. Numbers, text, geographic location, structured, unstructured. All data types are welcome.
(7) HADOOP & SPARK: Elasticsearch + Hadoop
Elasticsearch is a highly scalable open source full-text search and analysis engine. It allows you to store, search, and analyze large amounts of data quickly and in near real time.
Here are some use cases for using Elasticsearch:
(1) you run an online store and you allow your customers to search for the products you sell. In this case, you can use Elasticsearch to store the entire product catalog and inventory and provide them with search and auto-completion suggestions.
(2) you want to collect log or transaction data, and you want to analyze and mine that data to find trends, statistics, summaries, or anomalies. In this case, you can use loghide (part of the Elasticsearch/ loghide / Kibana stack) to collect, aggregate, and parse the data, and then have loghide input the data into Elasticsearch. Once the data is in Elasticsearch, you can run search and aggregation to mine any information you are interested in.
(3) you run a price alert platform that allows price-savvy customers to specify the following rules: "I am interested in buying specific electronic equipment, and I would like to be notified if the price of any supplier's product is less than $X next month." In this case, you can grab the supplier's price box, push them into Elasticsearch, and use its reverse search (Percolator) function to match the price trend with the customer query, and finally push the alert to the customer when a match is found.
(4) you have analytical / business intelligence requirements and want to quickly investigate, analyze, visualize, and ask special questions about large amounts of data (think of tens of millions or billions of records). In this case, you can use Elasticsearch to store data, and then use Kibana (part of the Elasticsearch/ loghide / Kibana stack) to build custom dashboards to visualize aspects of data that are important to you. In addition, you can use Elasticsearch aggregation capabilities to perform complex business intelligence queries on data.
15. Describe in detail the process of updating and deleting documents by Elasticsearch.
(1) deletion and update are also write operations, but documents in Elasticsearch are immutable, so they cannot be deleted or changed to show their changes.
(2) each segment on disk has a corresponding .del file. When the delete request is sent, the document is not actually deleted, but marked for deletion in the .del file. The document will still match the query, but will be filtered out in the results. When segments are merged, documents marked for deletion in the .del file will not be written to the new segment.
(3) when a new document is created, Elasticsearch assigns a version number to the document. When an update is performed, the old version of the document is marked for deletion in the .del file, and the new version of the document is indexed to a new segment. The older version of the document will still match the query, but will be filtered out in the results.
16. Describe the process of Elasticsearch search in detail.
17. In Elasticsearch, how do you find the corresponding inverted index based on a word?
(1) the indexing process of Lucene is the process of writing the inverted table into this file format according to the basic process of full-text retrieval.
(2) the search process of Lucene is the process of reading out the indexed information according to this file format, and then calculating the score of each document.
18. What are the optimization methods for setting up Linux when Elasticsearch is deployed?
(1) machines with 64 GB memory are ideal, but 32 GB and 16 GB machines are also common. Less than 8 GB can be counterproductive.
(2) if you want to choose between faster CPUs and more cores, it is better to choose more cores. The extra concurrency provided by multiple cores is far better than a slightly faster clock rate.
(3) if you can afford SSD, it will go far beyond any rotating medium. SSD-based nodes, query and index performance have been improved. If you can afford it, SSD is a good choice.
(4) even if the data centers are close at hand, avoid clustering across multiple data centers. Absolutely avoid clusters spanning large geographical distances.
(5) make sure that the JVM running your application is exactly the same as the JVM of the server. In several places in Elasticsearch, local serialization of Java is used.
(6) excessive sharding exchange can be avoided when the cluster is restarted by setting gateway.recover_after_nodes, gateway.expected_nodes and gateway.recover_after_time, which may shorten data recovery from a few hours to a few seconds.
(7) Elasticsearch is configured to use unicast discovery by default to prevent nodes from inadvertently joining the cluster. Only nodes running on the same machine are automatically clustered. It is best to use unicast instead of multicast.
(8) do not modify the size of the garbage collector (CMS) and each thread pool at will.
(9) give half of your memory to Lucene (but no more than 32 GB! Set through the ES_HEAP_SIZE environment variable
(10) swapping memory to disk is fatal to server performance. If memory is swapped to disk, a 100 microsecond operation may become 10 milliseconds. And think about how many 10 microsecond operation delays add up. It's not hard to see how terrible swapping is for performance.
(11) Lucene uses a large number of files. At the same time, Elasticsearch uses a large number of sockets to communicate between nodes and HTTP clients. All of this requires sufficient file descriptors. You should increase your file descriptor and set a large value, such as 64000.
Supplement: performance improvement method in indexing phase
(1) use batch requests and resize them: 5-15 MB of data in each batch is a good starting point.
(2) Storage: using SSD
(3) paragraph and merge: the default value of Elasticsearch is 20 MB/s, which should be a good setting for mechanical disks. If you are using SSD, consider raising it to 100-200 MB/s. If you are doing batch import and don't care about search at all, you can completely turn off the merge limit. You can also increase the index.translog.flflush_threshold_size setting from the default 512 MB to a larger value, such as 1 GB, which accumulates larger segments in the transaction log when a vacuuming is triggered.
(4) if your search results do not require near-real-time accuracy, consider changing the index.refresh_interval of each index to 30s.
(5) if you are doing a mass import, consider turning off the copy by setting index.number_of_replicas: 0.
19. For GC, what should you pay attention to when using Elasticsearch?
(1) the index of inverted dictionary needs to be resident in memory and cannot be GC, so you need to monitor the growth trend of segmentmemory on data node.
(2) all kinds of caches, fifield cache, fifilter cache, indexing cache, bulk queue, etc., should be set to a reasonable size, and should be used according to the worst-case scenario, that is, when all kinds of caches are full, is there any heap space that can be allocated to other tasks? Avoid using "self-deceiving" methods such as clear cache to free memory.
(3) avoid search and aggregation that return a large number of result sets. Scenarios that do require a large amount of data pull can be implemented using scan & scroll api.
(4) cluster stats resident memory cannot be expanded horizontally. Very large clusters can be split into multiple clusters and connected through tribe node.
(5) if you want to know whether the heap is enough, you must combine the actual application scenarios and continuously monitor the heap usage of the cluster.
(6) understand the memory requirements according to the monitoring data, and configure all kinds of circuit breaker reasonably to minimize the risk of memory overflow.
20. How does Elasticsearch aggregate large amounts of data (hundreds of millions of magnitude)?
21. In the case of concurrency, how does Elasticsearch ensure consistency in reading and writing?
(1) optimistic concurrency control can be used through the version number to ensure that the new version will not be overwritten by the old version, and specific conflicts can be handled by the application layer.
(2) in addition, for write operations, the consistency level supports quorum/one/all, which defaults to quorum, that is, write operations are allowed only when most shards are available. However, even if most of them are available, there may be a failure to write to the copy due to reasons such as the network, so that the copy is considered to be failed and the shard will be rebuilt on a different node.
(3) for read operations, you can set replication to sync (default), so that the operation will not be returned until both the master shard and the replica shard are completed. If replication is set to async, you can also query the main shard by setting the search request parameter _ preference to primary to ensure that the document is the latest version.
22. How to monitor the status of Elasticsearch clusters?
Marvel allows you to easily monitor Elasticsearch through Kibana. You can view your cluster health and performance in real time, and you can also analyze past cluster, index, and node metrics.
23. Introduce the overall technical framework of your e-commerce search. 24. Tell me about your personalized search plan?
Do you know anything about dictionary trees?
The common dictionary data structure is as follows:
The core idea of Trie is to exchange space for time, and use the common prefix of strings to reduce the cost of query time in order to improve efficiency. It has three basic properties:
1) the root node contains no characters, and each node except the root node contains only one character.
2) from the root node to a node, the characters passed on the path are concatenated to be the corresponding string of that node.
3) all child nodes of each node contain different characters.
26. How is spelling error correction realized?
Thank you for your reading, these are the contents of "what are the latest version of memcache interview questions in 2021?" after the study of this article, I believe you have a deeper understanding of what the latest version of memcache interview questions in 2021 have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.