Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the similar search plug-ins for PostgreSQL?

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what are the similar search plug-ins of PostgreSQL". In the daily operation, I believe that many people have doubts about the similar search plug-ins of PostgreSQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the questions of "what are the similar search plug-ins of PostgreSQL?" Next, please follow the editor to study!

Category 1: similar element overlap

Similar to inverted, similar calculation based on the degree of element overlap. It is widely used in similar search of array, full-text retrieval, string, text eigenvalue and multi-column arbitrary combination query.

The PostgreSQL plug-ins represented are as follows

1 、 rum

Https://github.com/postgrespro/rum

2 、 pg_trgm

Https://www.postgresql.org/docs/devel/static/pgtrgm.html

3 、 smlar

Http://sigaev.ru/git/gitweb.cgi?p=smlar.git;a=summary

4. Smlar+ hamming code (vector similar)

"massive data, simhash distance efficient Retrieval (smlar)-Aliyun RDS PosgreSQL Best practices"

5 、 pg_similarity

Https://github.com/eulerto/pg_similarity

Category 2: vector similarity (similar to knn distance)

The calculation of vector similarity and element overlap is obviously different. Based on the similarity of element overlap, it can be realized by inversion, as described in the previous section. Based on element vector similarity, we need to use a custom index interface, which is typically represented by the calculation of spatial distance of GiST index and the calculation of image eigenvalue similarity of imgsmlr plug-in.

1. Imgsmlr (the image vector is similar)

Https://github.com/postgrespro/imgsmlr

The principle is as follows

Take the average of 16 regions to generate 16 floating-point numbers as image eigenvalues.

A value is similar, and the absolute value of subtraction is the smallest.

The two values are similar, which can be understood as plane coordinates and the minimum distance (GiST knn distance ordering).

The three values are similar, which can be understood as the points in 3D coordinates and the points with the smallest distance.

...

Sixteen values are similar, similar to the above. The imgsmlr plug-in uses the gist index interface to implement vector similar index sorting of 16 elements.

Examples

Postgres=#\ d t_img Table "public.t_img" Column | Type | Collation | Nullable | Default-+-id | integer | | not null | sig | signature | | | | Indexes: "t_img_pkey" PRIMARY KEY | Btree (id) "idx_t_img_1" gist (sig)

Amount of data

Postgres=# select count (*) from tcm; count-319964709 (1 row) Time: 698.075 ms

Image eigenvalue search example, speed lever. (use citus+postgres+128 shard above)

Postgres=# select * from t_img order by sig'(3.539080, 0.243861, 1.509150, 1.781380, 8.677560, 4.232060, 8.979810, 1.665030, 1.294100, 4.449800, 9.200450, 1.859860,5.440250,7.788580,0.514258,8.424920) 'limit 1 Id | sig-+- - -- 148738668 | (2.554440 0.310499, 2.322520, 0.478624, 7.816080, 4.360440, 8.287050, 1.011060, 2.114320, 3.541110, 9.166300, 1.922250,4.488640,7.897890,1.600290,7.462080) (1 row) Time: 337.301 ms2 CUBE

Https://www.postgresql.org/docs/devel/static/cube.html

Ab float8 Euclidean distance between an and b.a b float8 Taxicab (Lmur1 metric) distance between an and b.a b float8 Chebyshev (L-inf metric) distance between an and b.

Cube performs slightly worse than imgsmlr when calculating picture vector similarity, because cube uses float8 and imgsmlr uses float4.

Examples

Cube

Postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_img0 order by sig::Text::cube'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460, 7.880070,2.574830,6.778820,5.156320,8.329430) 'limit 1 QUERY PLAN- - -Limit (cost=0.36..0.37 rows=1 width=76) (actual time=147.432..147.434 rows=1 loops=1) Output: id Sig, (sig):: text):: cube'(0.435404, 6.60287, 9.05022, 9.37975, 2.48392, 1.53466, 0.363753, 4.07967, 0.124681, 3.61122, 7.12746, 7.88007, 2.57483,6.77882,5.15632 8.32943):: cube)) Buffers: shared hit=16032-> Index Scan using idx_t_img0_1 on public.t_img0 (cost=0.36..13824.28 rows=754085 width=76) (actual time=147.430..147.430 rows=1 loops=1) Output: id, sig, ((sig):: text):: cube'(0.435404, 6.60287, 9.05022, 9.37975, 2.48392, 1.53466, 0.363753,4.07967,0.124681,3.61122,7.12746 7.88007, 2.57483, 6.77882, 5.15632, 8.32943):: cube) Order By: ((t_img0.sig):: text):: cube'(0.435404, 6.60287, 9.05022, 9.37975, 2.48392, 1.53466,0.363753, 4.07967,0.124681,3.61122,7.12746,7.88007,2.57483,6.7782,5.15632 8.32943)': cube) Buffers: shared hit=16032 Planning Time: 0.096 ms Execution Time: 148.905 ms (9 rows)

Imgsmlr

Postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_img0 order by sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460, 7.880070,2.574830,6.778820,5.156320,8.329430) 'limit 2 QUERY PLAN- - -Limit (cost=0.36..0.37 rows=2 width=72) (actual time=40.284..48.183 rows=2 loops=1) Output: id Sig, (sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460,7.880070,2.574830,6.778820,5.156320) 8.329430): signature)) Buffers: shared hit=2914-> Index Scan using t_img0_sig_idx on public.t_img0 (cost=0.36..7032.36 rows=754085 width=72) (actual time=40.282..48.179 rows=2 loops=1) Output: id, sig, (sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220,7.127460, 7.880070, 2.574830 6.778820, 5.156320, 8.329430):: signature) Order By: (t_img0.sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460, 7.880070, 2.574830, 6.778820, 5.156320, 8.329430):: signature) Buffers: shared hit=2914 Planning Time: 0.091 ms Execution Time: 48.210 ms (9 rows)

The advantage of cube over imgsmlr is that cube can calculate vector similarity in any dimension, while imgsmlr is only used to calculate vector similarity in 16 dimensions (signation type).

At this point, the study on "what are the similar search plug-ins for PostgreSQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report