Introduction to the basic knowledge of solr 04/14 Update SLTechnology News&Howtos

Introduction to the basic knowledge of solr

2025-04-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1.1 Solr Profile

ApacheSolr as a search server, to achieve site search with good encapsulation and good scalability, multi-portal community to use Solr search engine construction. Apache Solr is an open source search server, Solr uses Java language development, mainly based on HTTP and Apache Lucene implementation.

1.2 solr implementation principle

Solr provides a standard http interface to add, delete, modify, and query data indexes. In Solr, users initiate indexing and searching by sending HTTP requests to Solr Web applications deployed in servlet containers. Solr accepts the request, determines the appropriate SolrRequestHandler to use, and then processes the request. The response is returned in the same way over HTTP. The default configuration returns Solr's standard XML response, but you can configure Solr's alternate response format.

1.3 solr characteristic

Customizing the Solr index can be accomplished simply by sending an XML document describing all Fields and their contents to the Solr server using POST. To customize the search, you simply send an HTTP GET request and then rearrange the information returned by Solr to produce a layout of the page content that is easy for users to understand. Solr version 1.3 begins to support importing data from databases (via JDBC), RSS feeds, Web pages, and files, but does not directly support extracting content from binary file formats such as MS Office, Adobe PDF, or other proprietary formats. More importantly, Solr creates indexes that are fully compatible with Lucene search engine libraries. With proper configuration of Solr, which may require coding in some cases, Solr can read and use indexes built into other Lucene applications. In addition, many Lucene tools (such as Nutch, Luke) can also use indexes created by Solr.

Solr features include:

1. Advanced full-text search capabilities

2. Optimized for high-throughput network traffic

3. Standards based on open interfaces (XML and HTTP)

4. Integrated HTML management interface

5. Scalability-ability to replicate efficiently to another Solr search server

6. Flexibility and adaptability using XML configuration

7. Extensible plug-in architecture

1.4 solr uses Lucene and extends it.

1. A true Data Schema with Dynamic Fields and Unique Keys

2. Powerful extensions to Lucene query language

3. Dynamic grouping and filtering of results

4. Advanced, configurable text analysis

5. Highly configurable and scalable caching mechanism

6. performance optimization

7. Support external configuration via XML

8. Have an administrative interface

9. monitorable log

10. Fast incremental updates and snapshot distribution are supported

1.5 Schema

1. Define domain types and document domains

2. Capable of driving intelligent processing

3. Declarative Lucene Analyzer Specification

4. Dynamic domains can be added at any time

5. The Copy Domain feature allows multiple ways to index a domain or to combine multiple domains into a searchable domain

6. Explicit typing reduces guesswork about domain types

7. Ability to use external file-based configuration of terminator lists, synonym lists, and protected word lists

1.6 query

1. HTTP interface with configurable response formats (XML/XSLT,JSON,Python,Ruby)

2. Highlighted contextual search results

3. Faceted Search Based on Domain Value and Explicit Query

4. Added sort specification to query language

5. Constant scoring range and prefix queries-no idf,coord, or lengthNorm factors, no limit on the number of words a query matches

6. FunctionQuery-affects scoring by a function about the numerical value or order of a field

7. performance optimization

1.7 core

1. Pluggable Query Handler and Extensible XML Data Format

2. Fields with unique keys can enhance document uniqueness

3. Efficient batch updates and deletions

4. User-configurable document index change triggers (commands)

5. concurrent controlled searcher

6. Ability to handle numeric types correctly for sorting and range searching

7. Documents with control over missing sort fields

8. Support dynamic grouping of search results

1.8 cache

1. Configurable query results, filters, and document cache instances

2. Pluggable cache implementation

3. Background cache hot start: configurable search will warm up a new searcher when it is opened, preventing the first result from slowing down, and the current searcher will process the current request when it is hot start.

4. Background auto-warm: The most frequently visited items in the current searcher cache are regenerated in the new searcher, and the results of frequent queries can be cached when the indexer and searcher change.

5. Fast and small filter implementation

6. User-level cache with automatic hot-start

1.9 replication

1. Ability to effectively publish index portions that change when transmitted using rsync

2. Use PullStrategy to simplify adding searchers

3. Configurable release intervals allow tradeoffs between timelines and cache usage

1.10 management interface

1. Comprehensive statistics on cache usage, updates and queries

2. Text analysis debugger capable of displaying results for each phase of each analyzer

3. Web-based query and debug output: parse query output, Lucene explain method details, be able to explain why a document scored low, excluded from results, etc.

1.11 index

There are four different index requests that can be passed to the Solr index servlet:

1)add/update allows you to add or update documents to Solr. These additions and updates are not searchable until after submission.

2)Commit tells Solr that all changes made since the last commit should be searchable.

3)optimize Refactor Lucene files to improve search performance. It is usually better to perform some optimization after indexing is complete. If updates are frequent, you should schedule optimizations when usage is low. An index does not need to be optimized to function properly. Optimization is a time-consuming process.

4)delete can be specified by id or query. Delete by id deletes documents with the specified id; delete by query deletes all documents returned by the query.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.