Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to start the elasticsearch engine

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to start the elasticsearch engine". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to start the elasticsearch engine.

Engine is the place where ES is closest to God Lucene, and it is a layer of encapsulation for Lucene distributed environment access. The interface of this class is a command mode, so it naturally implements the operation log translog. The old version of the engine's implementation class is called RobinEngine, but the new version has been renamed and several version types have been added. However, this has little impact on our analysis of the source code. Its two main contents are operation log and data version. The highlight is the use of locks. Instead of listing the code here, we will look at the code from the perspective of encapsulation and concurrency.

Start the engine

Since it's an engine, let's start it before we start the discussion. ES is an instance managed by guice. In order to be intuitive, let's just New.

Public Engine createEngine () throws IOException {Index index=new Index ("index"); ShardId shardId = new ShardId (index, 1); ThreadPool threadPool = new ThreadPool (); CodecService cs = new CodecService (shardId.index ()) AnalysisService as = new AnalysisService (shardId.index ()); SimilarityService ss = new SimilarityService (shardId.index ()); Translog translog = new FsTranslog (shardId, EMPTY_SETTINGS, new File ("c:/fs-translog")) DirectoryService directoryService = new RamDirectoryService (shardId, EMPTY_SETTINGS); Store store = new Store (shardId, EMPTY_SETTINGS, null, directoryService, new LeastUsedDistributor (directoryService)); SnapshotDeletionPolicy sdp = new SnapshotDeletionPolicy (new KeepOnlyLastDeletionPolicy (shardId, EMPTY_SETTINGS)) MergeSchedulerProvider scp = new SerialMergeSchedulerProvider (shardId, EMPTY_SETTINGS, threadPool); MergePolicyProvider mpp = new LogByteSizeMergePolicyProvider (store, new IndexSettingsService (index, EMPTY_SETTINGS)); IndexSettingsService iss = new IndexSettingsService (shardId.index (), EMPTY_SETTINGS) ShardIndexingService sis = new ShardIndexingService (shardId,EMPTY_SETTINGS, new ShardSlowLogIndexingService (shardId,EMPTY_SETTINGS, iss)); Engine engine = new RobinEngine (shardId,EMPTY_SETTINGS,threadPool,iss,sis,null,store, sdp, translog,mpp, scp,as, ss,cs); return engine;} encapsulation of Lucene

After encapsulation, you are actually using JSON syntax to query and return JSON content. At the engine level, this has not been realized yet. Think back to Lucene's CURD. Let's see what the CURD of the ES engine looks like. First, the Document is encapsulated, which is called ParsedDocument.

Private ParsedDocument createParsedDocument (String uid, String id, String type, String routing, long timestamp, long ttl,Analyzer analyzer, BytesReference source, boolean mappingsModified) {Field uidField = new Field ("_ uid", uid, UidFieldMapper.Defaults.FIELD_TYPE); Field versionField = new NumericDocValuesField ("_ version", 0); Document document=new Document (); document.add (uidField); document.add (versionField) Document.add (new Field ("_ source", source.toBytes ()); document.add (new TextField ("name", "myname", Field.Store.NO)); return new ParsedDocument (uidField, versionField, id, type, routing, timestamp, ttl, Arrays.asList (document), analyzer, source, mappingsModified);} Engine engine = createEngine (); engine.start () String json= "{\" name\ ":\" myname\ "}"; BytesReference source = new BytesArray (json.getBytes ()); ParsedDocument doc = createParsedDocument ("2", "myid", "mytype", null,-1, Lucene.STANDARD_ANALYZER, source, false); / / add Engine.Create create = new Engine.Create (null, newUid ("2"), doc); engine.create (create); create.version (2); create.versionType (VersionType.EXTERNAL); engine.create (create) / delete Engine.Delete delete = new Engine.Delete ("mytype", "myid", newUid ("2")); engine.delete (delete); / / modify similar additions, omitting / / query TopDocs result=null;try {Query query=new MatchAllDocsQuery (); result=engine.searcher (). Searcher (). Search (query,10); System.out.println (result.totalHits);} catch (Exception e) {e.printStackTrace () } / / get Engine.GetResult gr = engine.get (new Engine.Get (true, newUid ("1")); System.out.println (gr.source () .source.toUtf8 ())

In terms of usage, this little guy is not that good. It doesn't matter, in the end, he will become more and more powerful, until he finally grows into a high-end ES.

Comparison between query and Get

We are clear about the process of query, so what kind of logic is this Get operation that Lucene does not have? Let's take a look.

Ideally, the data is returned directly from the translog. Otherwise, the entry will be located in TermsEnum according to UID. And then returns the contents from the fdt file according to the pointer

That is to say, if query is divided into two stages: query and fatch, the previous stage is different. There are fewer steps than termQuery.

/ / I simplified the code and demonstrated the code for (AtomicReaderContext context: reader.leaves ()) {try {Terms terms=context.reader (). Terms ("brand"); TermsEnum te= terms.iterator (null); BytesRef br=new BytesRef ("Mo" .getBytes ()) If (te.seekExact (br,false)) {DocsEnum docs = te.docs (null, null); for (int d = docs.nextDoc (); d! = DocsEnum.NO_MORE_DOCS D = docs.nextDoc () {System.out.println (reader.document (d) .getBinaryValue ("_ source") .utf8ToString ()) } catch (IOException e) {e.printStackTrace ();}} the difference between refresh and Flush

What refresh calls is Lucene's searcherManager.maybeRefresh (). For Flush, there are three cases.

Static class Flush {public static enum Type {/ * create a new Writer * / NEW_WRITER, / * submit writer * / COMMIT, / * submit translog. * / COMMIT_TRANSLOG}

Refresh's words are a little lighter. He will refresh automatically by default.

@ Override public TimeValue defaultRefreshInterval () {return new TimeValue (1, TimeUnit.SECONDS);} concurrency control

The logic of each operation is the same, let's choose one to create and take a look. The engine is the most frequently used lock in the whole ES, and it is strange to use it layer by layer, if nothing happens.

@ Override public void create (Create create) throws EngineException {rwl.readLock () .lock (); try {IndexWriter writer = this.indexWriter; if (writer = = null) {throw new EngineClosedException (shardId, failedEngine);} innerCreate (create, writer); dirty = true; possibleMergeNeeded = true; flushNeeded = true } catch (IOException e) {throw new CreateFailedEngineException (shardId, create, e);} catch (OutOfMemoryError e) {failEngine (e); throw new CreateFailedEngineException (shardId, create, e);} catch (IllegalStateException e) {if (e.getMessage (). Contains ("OutOfMemoryError")) {failEngine (e) } throw new CreateFailedEngineException (shardId, create, e);} finally {rwl.readLock (). Unlock ();}} private void innerCreate (Create create, IndexWriter writer) throws IOException {synchronized (dirtyLock (create.uid () {/ /.... Omitting the verification of the data version, we will not talk about if (create.docs (). Size () > 1) {writer.addDocuments (create.docs (), create.analyzer ());} else {writer.addDocument (create.docs (). Get (0), create.analyzer () } Translog.Location translogLocation = translog.add (new Translog.Create (create)); / / versionMap.put (versionKey, new VersionValue (updatedVersion, false, threadPool.estimatedTimeInMillis (), translogLocation); indexingService.postCreateUnderLock (create);}}

The first is a read-write lock. Read locks are added to many operations, and write locks are added only when starting and shutting down the engine, recover-phase3, NEW_WRITER type flush. It means. I can't do anything when I start the shutdown engine, data recovery, and recreate the indexWriter. Next is an object lock. UID locking, using lock segmentation technology, is the principle of ConcurrentHashMap. Reduces the creation of a large number of lock objects. You know, UID is a huge object. If you use a String here, you can OO it every minute.

Private final Object [] dirtyLocks;this.dirtyLocks = new Object [indexConcurrency * 50]; / / there will be a maximum of 850 lock objects by default. For (int I = 0; I < dirtyLocks.length; iLock +) {dirtyLocks [I] = new Object ();} private Object dirtyLock (BytesRef uid) {int hash = DjbHashFunction.DJB_HASH (uid.bytes, uid.offset, uid.length) / / abs returns Integer.MIN_VALUE, so we need to protect against it... If (hash = = Integer.MIN_VALUE) {hash = 0;} return dirtyLocks [Math.abs (hash)% dirtyLocks.length];} Thank you for reading, the above is the content of "how to start the elasticsearch engine". After the study of this article, I believe you have a deeper understanding of how to start the elasticsearch engine, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report