How to write Solr schema 07/08 Update SLTechnology News&Howtos

How to write Solr schema

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to write Solr schema". In daily operation, I believe many people have doubts about how to write Solr schema. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "how to write Solr schema". Next, please follow the editor to study!

1.uniquekey function: when add doc is configured with uniqueKey, the same uniqueKey in the later doc will overwrite the previous doc

If there is no configuration, it will not be overwritten. When update, according to uniqueryKey information,

Therefore, with update, it is recommended to be equipped with uniqueKey. On the one hand, it is more complete configuration information, and on the other hand, it is also convenient for data troubleshooting.

Corresponding to the domain id, it must be stored=true or indexed=true. It is recommended to use long or int instead of string.

If there is a special scene that needs to be sorted by id, use TriedLongField, otherwise it will be sorted by text order.

2. DefaultSearchField, hence the name Siyi is what domain to look up when querying by default. The usage of this sentence is reflected in:

QueryStr= content:abc 123, equivalent to queryStr= content:abc title:123

QueryStr= 123Equivalent queryStr= title:123

In other words, when you wait for the query, you do not specify which domain to look up, and it will correspond to defaultSearchField by default.

Since it is defaultSearchField, then this domain indexed=true is a must

Pay attention to distinguish between content:abc 123and content: "abc 123" and content: (abc 123)

3. All int sint tint long slong tlong float sfloat tfloat double sdouble tdouble

Do not support participle, and there is no need for participle. There is no sshort tshort, only short. Because these basic types have only one value, there is no need for participle.

For int long fload Field, there should be no positionincrmentGap=100

There are pricisionStep, positionIncrementGap, sortMissingLast= "true" attributes only for tint tlong tdoube.

4. Configuration participle

Only all TextField have a chance to segment words.

All TextField have the opportunity to execute facet.

All omitTermFreqAndPositions= "true" configured by TextField will have an effect, and the frequency and position information in the sorting is gone.

5. The parameter omitNorms= "true" affects the score of the domain. After being removed, the score of the same word in the long and short domain is the same. According to the Xiangnong principle

A word appears in longer text, or the more times it appears, the lower the information value. Corresponding to omitNorms=false, then

Below Taobao appears doc1 Taobao Hangzhou appears doc2 Taobao Hangzhou Network Co., Ltd., when hit Taobao, the doc1 score is higher than doc2

Note: there is only an omitNorms= "false" for one domain, which means that all domains retain the location of omitNorms, even though the omitNorms content is empty.

Therefore, omitNorms is helpful for indexing only when all domains are omitNorms= "true".

6. Required= "true"

This property means that once required= "true" is enabled for the domain in scham, the domain cannot be empty when indexing, and this doc thinks it is endless.

Do it. Currently, if you go to the center of dump, the value of null will be assigned to "", so there will not be no value. But it should be highlighted in schema, if logically

It is necessary to make sure that some fields must have

7. MultiValued= "true"

This configuration does not mean that term is a single or in a domain. Even if mulitValued=false, a text field can be very long at the same time

A paragraph of text, that is, the case of a lot of term. What multiValued= "true" really means is that when a domain is indexed by doc,

Is mulitValued= "true", so you can continue the add content to this domain. Equivalent to a doc, the key:value of the same domain name can have

Multiple. In general, the use of map,key is unique, and there will not be multiple cases of the same key but different value.

In addition, mulitValued= "true" is configured to return list instead of a single object when the document is hit.

In the current indexing set of the final search, whether the multiValued is matched or not has no effect on the dump process, only when the hit returns.

A list or single object is returned.

In depth, multiValued= "true" actually opens up a new domain when indexing, allowing domains with the same domain name to appear multiple times.

When querying, all domains with the same domain name will be queried, which will have a certain impact on the retrieval performance, especially after the expansion of the domain name.

8. Special participle

Corresponding to the proposed #;: equal participle, in fact, it is recommended to be uniformly transformed into space participle, which is the original participle of the system, and is based on the compiler level, with better performance.

There is no need to customize a new code and deploy it for a #.

9. Sort, interval, general query

The sort should be a numeric type. It is recommended to use the trie type. The old sortable also supports it.

Interval should also be of numeric type, and tried type is recommended.

If multiple numbers are combined in a general query, it is recommended to characterize the numbers and then separate the spaces. Currently, arrays of numeric types are not supported.

10. Date tdate and other types

When configuring types such as data tdate, you need to pay attention to the time format.

In addition, it is not recommended to save directly, but to save the int type after the difference.

Due to the different precision control of data, the term of the data domain used will grow linearly, which is quite afraid.

This long tail will consume a lot of memory and space resources.

The linear growth of terms in the index is quite a scary thing. At present, there is no special optimization for the treatment of long tail.

GM's growth in term aggregation in the index is also quite scary, and there is no special optimization for long-chain processing at present.

11. Advanced work

Self-check the quality of schema.

When schema is configured, you can test it with terminatorquickstart, and then the luke tool looks at the index structure.

This may find some problems. Maybe there are many places where the structure can be optimized.

At this point, the study of "how to write Solr schema" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.