In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the content of the Solr configuration file schema.xml, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.
1. Field configuration (schema)
Schema.xml is located in the solr/collection1/conf/ directory, similar to the datasheet configuration file
Defines the data types of indexed data, including type, fields, and other default settings.
1. Let's first take a look at the type node, which defines the FieldType child node, including some parameters such as name,class,positionIncrementGap.
Name: that's the name of the FieldType.
Class: point to the corresponding class name in the org.apache.solr.analysis package to define the behavior of this type.
...
...
If necessary, fieldType also needs to define the parser analyzer to be used for indexing and querying this type of data, including word segmentation and filtering, as follows:
2. Let's take a look at the specific fields defined in the fields node (similar to database fields), which contain the following attributes:
Name: field name
Type: various FieldType previously defined
Indexed: whether to be indexed
Stored: whether it is stored (if you do not need to store the corresponding field values, set it to false as far as possible)
MultiValued: whether there are multiple values (set to true as far as possible for fields that may have multiple values to avoid throwing errors during indexing)
3. It is recommended to establish a copy field and copy all the full-text fields into one field for unified retrieval:
The following are the copy settings:
4. Dynamic fields. For fields without specific names, use dynamicField fields.
For example, if name is * _ I and its type is defined as int, then when using this field, the fields in which the task results with _ I are considered to conform to this definition. Such as name_i, school_i
Information in the comments in the schema.xml documentation:
1. In order to improve performance, the following measures can be taken:
Set the stored of all field that are only used for search and do not need to be used as a result (especially some larger field) to false
Set the indexed of the field that does not need to be used for the search but is returned as a result to false
Delete all unnecessary copyField statements
To minimize index fields and search efficiency, set the index of all text fields to field, then copy them all to a total text field using copyField, and then search for him.
To maximize search efficiency, clients written in java interact with solr (using streaming communication)
Run JVM on the server side (omitting network traffic) and use the highest possible Log output level to reduce log volume.
2 、
Name: identifies the name of this schema
Version: the current version is 1.2
3 、 filedType
Name: just a logo.
Class and other attributes determine the actual behavior of this fieldType. (class starts with solr, all under the org.appache.solr.analysis package.)
Optional attributes:
The sortMissingLast and sortMissingFirst attributes are used on types that can be sorted using String inherently (including: string,boolean,sint,slong,sfloat,sdouble,pdate).
SortMissingLast= "true", the data without the field is ranked after the data with the field, regardless of the collation at the time of the request.
SortMissingFirst= "true", turn it upside down.
2 values are set to false by default
StrField types are not parsed, but are indexed / stored verbatim.
Both StrField and TextField have an optional attribute "compressThreshold" that ensures compression to no less than one size (in char)
Solr.TextField allows users to customize indexes and queries through a parser, which includes a word splitter (tokenizer) and multiple filters (filter).
PositionIncrementGap: an optional attribute that defines the white space interval for this type of data in the same document to avoid phrase matching errors.
Name: field type name
Class: java class name
Indexed: the default true. Indicates that this data should be searched and sorted, and if the data does not have indexed, then stored should be true.
Stored: the default true. It is appropriate to indicate that this field is included in the search results. If the data does not have a stored, then the indexed should be true.
SortMissingLast: means that the document without the specified field data comes after the document with the specified field data
SortMissingFirst: refers to the document that does not have the specified field data before the document with the specified field data
OmitNorms: the length of the field does not affect the score and set it to true when no boost is done when indexing. The general text field is not set to true.
TermVectors: set to true if the field is used as a feature of more like this and highlight.
Compressed: the field is compressed. This may slow indexing and search, but reduces storage space, and only StrField and TextField are compressible, which is usually suitable for fields longer than 200 characters.
MultiValued: can be set to true when the field has more than one value.
PositionIncrementGap: and multiValued
Use together to set the number of virtual whitespace between multiple values
Space participle, exact match.
Consider "-" hyphens, alphanumeric boundaries, and non-alphanumeric characters when segmenting and matching, so that "wifi" or "wifi" can match "Wi-Fi".
Synonym
Increase the interval between phrases after the forbidden word (stopword) is deleted
Stopword: words that are ignored during indexing (indexing and searching), such as common words such as is this. Maintenance in conf/stopwords.txt.
4 、 fields
Name: just a logo.
Type: the previously defined type.
Indexed: whether it is used to build indexes (related to search and sorting)
Stored: whether to save or not
Compressed: [false], whether to use gzip compression (only TextField and StrField can compress)
MutiValued: whether to contain multiple values
OmitNorms: whether to ignore Norm or not can save memory space. Only full-text field and need an index-time boost field need norm. (I don't understand the details, there is a contradiction in the notes)
TermVectors: [false]. When true is set, term vector is stored. When using MoreLikeThis, the field used as a similar word should be stored.
TermPositions: stores address information in term vector, which consumes storage overhead.
TermOffsets: the offset of the storage term vector, which consumes storage overhead.
Default: if there are no attributes that need to be modified, you can use this flag.
The all-inclusive (somewhat exaggerated) field, which contains all the searchable text fields, is implemented through copyField.
When adding an index, copy all data from the copied field (such as cat) to the text field
Function:
Search data from multiple field together at the same time to provide speed
Copying data from one field to another can be indexed in two different ways.
If the name of a field does not match, a dynamic field is used to try to match the various patterns defined.
"*" can only appear at the front and end of the pattern.
Longer patterns will be matched first.
If two patterns match at the same time, the first defined takes precedence.
If none of the above matches are found, you can define this, and then define a type when String handles it. (it usually doesn't happen)
However, if it is not defined, an error will be reported if no match is found.
5. Other tags
Id
The unique identification of the document, this field must be filled in (unless the field is marked required= "false"), otherwise solr establishes the index to report an error.
Text
If no specific field is specified in the search parameters, then this is the default domain.
Configure the logic between search parameter phrases, which can be "AND | OR".
Thank you for reading this article carefully. I hope the article "what is the content of the Solr configuration file schema.xml" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 248
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.