What is the content of the Solr configuration file schema.xml 07/09 Update SLTechnology News&Howtos

What is the content of the Solr configuration file schema.xml

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the content of the Solr configuration file schema.xml, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

1. Field configuration (schema)

Schema.xml is located in the solr/collection1/conf/ directory, similar to the datasheet configuration file

Defines the data types of indexed data, including type, fields, and other default settings.

1. Let's first take a look at the type node, which defines the FieldType child node, including some parameters such as name,class,positionIncrementGap.

Name: that's the name of the FieldType.

Class: point to the corresponding class name in the org.apache.solr.analysis package to define the behavior of this type.

...

If necessary, fieldType also needs to define the parser analyzer to be used for indexing and querying this type of data, including word segmentation and filtering, as follows:

2. Let's take a look at the specific fields defined in the fields node (similar to database fields), which contain the following attributes:

Name: field name

Type: various FieldType previously defined

Indexed: whether to be indexed

Stored: whether it is stored (if you do not need to store the corresponding field values, set it to false as far as possible)

MultiValued: whether there are multiple values (set to true as far as possible for fields that may have multiple values to avoid throwing errors during indexing)

3. It is recommended to establish a copy field and copy all the full-text fields into one field for unified retrieval:

The following are the copy settings:

4. Dynamic fields. For fields without specific names, use dynamicField fields.

For example, if name is * _ I and its type is defined as int, then when using this field, the fields in which the task results with _ I are considered to conform to this definition. Such as name_i, school_i

Information in the comments in the schema.xml documentation:

1. In order to improve performance, the following measures can be taken:

Set the stored of all field that are only used for search and do not need to be used as a result (especially some larger field) to false

Set the indexed of the field that does not need to be used for the search but is returned as a result to false

Delete all unnecessary copyField statements

To minimize index fields and search efficiency, set the index of all text fields to field, then copy them all to a total text field using copyField, and then search for him.

To maximize search efficiency, clients written in java interact with solr (using streaming communication)

Run JVM on the server side (omitting network traffic) and use the highest possible Log output level to reduce log volume.

2 、

Name: identifies the name of this schema

Version: the current version is 1.2

3 、 filedType

Name: just a logo.

Class and other attributes determine the actual behavior of this fieldType. (class starts with solr, all under the org.appache.solr.analysis package.)

Optional attributes:

The sortMissingLast and sortMissingFirst attributes are used on types that can be sorted using String inherently (including: string,boolean,sint,slong,sfloat,sdouble,pdate).

SortMissingLast= "true", the data without the field is ranked after the data with the field, regardless of the collation at the time of the request.

SortMissingFirst= "true", turn it upside down.

2 values are set to false by default

StrField types are not parsed, but are indexed / stored verbatim.

Both StrField and TextField have an optional attribute "compressThreshold" that ensures compression to no less than one size (in char)

Solr.TextField allows users to customize indexes and queries through a parser, which includes a word splitter (tokenizer) and multiple filters (filter).

PositionIncrementGap: an optional attribute that defines the white space interval for this type of data in the same document to avoid phrase matching errors.

Name: field type name

Class: java class name

Indexed: the default true. Indicates that this data should be searched and sorted, and if the data does not have indexed, then stored should be true.

Stored: the default true. It is appropriate to indicate that this field is included in the search results. If the data does not have a stored, then the indexed should be true.

SortMissingLast: means that the document without the specified field data comes after the document with the specified field data

SortMissingFirst: refers to the document that does not have the specified field data before the document with the specified field data

OmitNorms: the length of the field does not affect the score and set it to true when no boost is done when indexing. The general text field is not set to true.

TermVectors: set to true if the field is used as a feature of more like this and highlight.

Compressed: the field is compressed. This may slow indexing and search, but reduces storage space, and only StrField and TextField are compressible, which is usually suitable for fields longer than 200 characters.

MultiValued: can be set to true when the field has more than one value.

PositionIncrementGap: and multiValued

Use together to set the number of virtual whitespace between multiple values

Space participle, exact match.

Consider "-" hyphens, alphanumeric boundaries, and non-alphanumeric characters when segmenting and matching, so that "wifi" or "wifi" can match "Wi-Fi".

Synonym

Increase the interval between phrases after the forbidden word (stopword) is deleted

Stopword: words that are ignored during indexing (indexing and searching), such as common words such as is this. Maintenance in conf/stopwords.txt.

4 、 fields

Name: just a logo.

Type: the previously defined type.

Indexed: whether it is used to build indexes (related to search and sorting)

Stored: whether to save or not

Compressed: [false], whether to use gzip compression (only TextField and StrField can compress)

MutiValued: whether to contain multiple values

OmitNorms: whether to ignore Norm or not can save memory space. Only full-text field and need an index-time boost field need norm. (I don't understand the details, there is a contradiction in the notes)

TermVectors: [false]. When true is set, term vector is stored. When using MoreLikeThis, the field used as a similar word should be stored.

TermPositions: stores address information in term vector, which consumes storage overhead.

TermOffsets: the offset of the storage term vector, which consumes storage overhead.

Default: if there are no attributes that need to be modified, you can use this flag.

The all-inclusive (somewhat exaggerated) field, which contains all the searchable text fields, is implemented through copyField.

When adding an index, copy all data from the copied field (such as cat) to the text field

Function:

Search data from multiple field together at the same time to provide speed

Copying data from one field to another can be indexed in two different ways.

If the name of a field does not match, a dynamic field is used to try to match the various patterns defined.

"*" can only appear at the front and end of the pattern.

Longer patterns will be matched first.

If two patterns match at the same time, the first defined takes precedence.

If none of the above matches are found, you can define this, and then define a type when String handles it. (it usually doesn't happen)

However, if it is not defined, an error will be reported if no match is found.

5. Other tags

The unique identification of the document, this field must be filled in (unless the field is marked required= "false"), otherwise solr establishes the index to report an error.

Text

If no specific field is specified in the search parameters, then this is the default domain.

Configure the logic between search parameter phrases, which can be "AND | OR".

Thank you for reading this article carefully. I hope the article "what is the content of the Solr configuration file schema.xml" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.