Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Implementation of full-text search algorithm for search engine (based on Lucene)

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

When I went to the turntable network before, I had already released the code of non-full-text search, and the friends I needed wanted to be able to read my blog. This article mainly discusses how to carry out full-text search, because I spent a long time to design a new work: opinion, the demand for full-text search is still very high, so I spent a lot of time studying full-text search. You can experience it first: click me to search. No more nonsense, just go to the code:

Public Map articleSearchAlgorithms (SearchCondition condition,IndexSearcher searcher) throws ParseException, IOException {Map map = new HashMap (); String [] filedsList=condition.getFiledsList (); String keyWord=condition.getKeyWord (); int currentPage=condition.getCurrentPage (); int pageSize=condition.getPageSize (); String sortField=condition.getSortField (); boolean isASC=condition.isDESC (); String sDate=condition.getsDate () String eDate=condition.geteDate (); String classify=condition.getClassify (); / / filter the terminal character keyWord=escapeExprSpecialWord (keyWord); BooleanQuery Q1 = new BooleanQuery (); BooleanQuery Q2 = new BooleanQuery (); BooleanQuery booleanQuery = new BooleanQuery () / / boolean query if (default speech if (classify.equals ("guanzhi")) {typeId= "2") (classify.equals ("guanzhi") | | classify.equals ("opinion") | | classify.equals ("write")) {String typeId= "1"; / / default speech if (classify.equals ("guanzhi")) } if (classify.equals ("opinion")) {typeId= "3";} Query termQuery = new TermQuery (new Term ("typeId", typeId)); q1.add (termQuery,BooleanClause.Occur.MUST) } if {/ / whether the range query is determined by these two parameters Query rangeQuery = new TermRangeQuery ("writingTime", new BytesRef (sDate), new BytesRef (eDate), true, true); q1.add (rangeQuery,BooleanClause.Occur.MUST);} Sort sort = new Sort () / / sort sort.setSort (SortField.FIELD_SCORE); if (sortFieldweights null) {sort.setSort (new SortField (sortField, SortField.Type.STRING, isASC));} int start = (currentPage-1) * pageSize; int hm = start + pageSize TopFieldCollector res = TopFieldCollector.create (sort,hm,false, false); / / exact matching query Term t0=new Term (filedsList [1], keyWord); TermQuery termQuery = new TermQuery (t0); / / two highly matched queries q2.add (termQuery,BooleanClause.Occur.SHOULD) / / prefix matching Term t1=new Term (filedsList [1], keyWord); PrefixQuery prefixQuery=new PrefixQuery (T1); q2.add (prefixQuery,BooleanClause.Occur.SHOULD); / / phrase, similarity matching, suitable for content for of word segmentation (int item0) I0) {booleanQuery.add (Q1 res BooleanClause.Occur.MUST);} if (Q2gramgramnull & & q2.toString (). Length () > 0) {booleanQuery.add (Q2Zhong BooleanClause.Occur.MUST);} searcher.search (booleanQuery, res); long amount = res.getTotalHits () TopDocs tds = res.topDocs (start, pageSize); map.put ("amount", amount); map.put ("tds", tds); map.put ("query", booleanQuery); return map;}

Note: the search criteria (SearchCondition) of the above code are the specific requirements of Viewpoint. You can make changes according to your own search conditions, and it is difficult to adapt to all readers here.

Public Map searchArticle (SearchCondition condition) throws Exception {Map map = new HashMap (); List list=new ArrayList (); DirectoryReader reader=condition.getReader (); String URL=condition.getURL (); boolean isHighligth=condition.isHighlight (); String keyWord=condition.getKeyWord (); IndexSearcher searcher=getSearcher (reader,URL); try {Map output=articleSearchAlgorithms (condition,searcher) If (output==null) {map.put ("amount", 0L); map.put ("source", null); return map;} map.put ("amount", output.get ("amount")); TopDocs tds = (TopDocs) output.get ("tds") ScoreDoc [] sd = tds.scoreDocs; Query query = (Query) output.get ("query"); for (int I = 0; I < sd.length; iTunes +) {Document doc = searcher.doc (SD [I] .doc); String id = doc.get ("id") / * * start* needs to be handled together * / String temp=doc.get ("title"); String title = temp / / do not highlight if (isHighligth) {/ / highlight the article title Highlighter highlighterTitle = new Highlighter (simpleHTMLFormatter, new QueryScorer (query)); highlighterTitle.setTextFragmenter (new SimpleFragmenter (40)); / / word length TokenStream ts = analyzer.tokenStream ("title", new StringReader (temp)) Title= highlighterTitle.getBestFragment (ts,temp); if (title==null) {title=temp.replace (keyWord, "" + keyWord+ "); / / highlight processing plug-in bug, add this sentence to avoid}} String temp1=HtmlEnDecode.htmlEncode (doc.get (" content ")) String content=temp1;// uses its own encapsulated method to escape if (isHighligth) {/ / highlight, content Highlighter highlighterContent = new Highlighter (simpleHTMLFormatter, new QueryScorer (query)); highlighterContent.setTextFragmenter (new SimpleFragmenter (Constant.HIGHLIGHT_CONTENT_LENGTH)) / / word length / / temp1=StringEscapeUtils.escapeHtml (temp1); / / escaping Chinese characters leads to highlight invalidation TokenStream ts1 = analyzer.tokenStream ("content", new StringReader (temp1)); content = highlighterContent.getBestFragment (ts1,temp1) If (content==null) {content=temp1.replace (keyWord, "" + keyWord+ "); / / highlight processing plug-in bug, add this sentence to avoid / / suppose other highlights will automatically capture screenshots content=subContent (content) / / intercept processing content=HtmlEnDecode.htmldecode (content); / / html decoding content=SubStringHTML.sub (content,Constant.HIGHLIGHT_CONTENT_LENGTH) }} / *-put the ever-changing data together-- * / Write write=writeDao.getArticle (Long.parseLong (id)) If (writeworthy null) {write.setTitle (title); write.setContent (content); Date writingTime=write.getWritingTime (); String timeGap=DateUtil.dateGap (writingTime); / / timeGap write.setTimeGap (timeGap) List.add (write);} catch (Exception e) {e.printStackTrace ();} map.put ("source", list); return map;}

Note that this is a specific search code. Different application scenarios have different requirements. Please encapsulate objects and query databases according to your own requirements. The code is unreserved and absolutely available.

If you have any questions, you can add qq group: 284205104 if the group is full, please go to the turntable to find the latest group. Thank you for your reading.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report