In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "what are the similar search plug-ins of PostgreSQL". In the daily operation, I believe that many people have doubts about the similar search plug-ins of PostgreSQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the questions of "what are the similar search plug-ins of PostgreSQL?" Next, please follow the editor to study!
Category 1: similar element overlap
Similar to inverted, similar calculation based on the degree of element overlap. It is widely used in similar search of array, full-text retrieval, string, text eigenvalue and multi-column arbitrary combination query.
The PostgreSQL plug-ins represented are as follows
1 、 rum
Https://github.com/postgrespro/rum
2 、 pg_trgm
Https://www.postgresql.org/docs/devel/static/pgtrgm.html
3 、 smlar
Http://sigaev.ru/git/gitweb.cgi?p=smlar.git;a=summary
4. Smlar+ hamming code (vector similar)
"massive data, simhash distance efficient Retrieval (smlar)-Aliyun RDS PosgreSQL Best practices"
5 、 pg_similarity
Https://github.com/eulerto/pg_similarity
Category 2: vector similarity (similar to knn distance)
The calculation of vector similarity and element overlap is obviously different. Based on the similarity of element overlap, it can be realized by inversion, as described in the previous section. Based on element vector similarity, we need to use a custom index interface, which is typically represented by the calculation of spatial distance of GiST index and the calculation of image eigenvalue similarity of imgsmlr plug-in.
1. Imgsmlr (the image vector is similar)
Https://github.com/postgrespro/imgsmlr
The principle is as follows
Take the average of 16 regions to generate 16 floating-point numbers as image eigenvalues.
A value is similar, and the absolute value of subtraction is the smallest.
The two values are similar, which can be understood as plane coordinates and the minimum distance (GiST knn distance ordering).
The three values are similar, which can be understood as the points in 3D coordinates and the points with the smallest distance.
...
Sixteen values are similar, similar to the above. The imgsmlr plug-in uses the gist index interface to implement vector similar index sorting of 16 elements.
Examples
Postgres=#\ d t_img Table "public.t_img" Column | Type | Collation | Nullable | Default-+-id | integer | | not null | sig | signature | | | | Indexes: "t_img_pkey" PRIMARY KEY | Btree (id) "idx_t_img_1" gist (sig)
Amount of data
Postgres=# select count (*) from tcm; count-319964709 (1 row) Time: 698.075 ms
Image eigenvalue search example, speed lever. (use citus+postgres+128 shard above)
Postgres=# select * from t_img order by sig'(3.539080, 0.243861, 1.509150, 1.781380, 8.677560, 4.232060, 8.979810, 1.665030, 1.294100, 4.449800, 9.200450, 1.859860,5.440250,7.788580,0.514258,8.424920) 'limit 1 Id | sig-+- - -- 148738668 | (2.554440 0.310499, 2.322520, 0.478624, 7.816080, 4.360440, 8.287050, 1.011060, 2.114320, 3.541110, 9.166300, 1.922250,4.488640,7.897890,1.600290,7.462080) (1 row) Time: 337.301 ms2 CUBE
Https://www.postgresql.org/docs/devel/static/cube.html
Ab float8 Euclidean distance between an and b.a b float8 Taxicab (Lmur1 metric) distance between an and b.a b float8 Chebyshev (L-inf metric) distance between an and b.
Cube performs slightly worse than imgsmlr when calculating picture vector similarity, because cube uses float8 and imgsmlr uses float4.
Examples
Cube
Postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_img0 order by sig::Text::cube'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460, 7.880070,2.574830,6.778820,5.156320,8.329430) 'limit 1 QUERY PLAN- - -Limit (cost=0.36..0.37 rows=1 width=76) (actual time=147.432..147.434 rows=1 loops=1) Output: id Sig, (sig):: text):: cube'(0.435404, 6.60287, 9.05022, 9.37975, 2.48392, 1.53466, 0.363753, 4.07967, 0.124681, 3.61122, 7.12746, 7.88007, 2.57483,6.77882,5.15632 8.32943):: cube)) Buffers: shared hit=16032-> Index Scan using idx_t_img0_1 on public.t_img0 (cost=0.36..13824.28 rows=754085 width=76) (actual time=147.430..147.430 rows=1 loops=1) Output: id, sig, ((sig):: text):: cube'(0.435404, 6.60287, 9.05022, 9.37975, 2.48392, 1.53466, 0.363753,4.07967,0.124681,3.61122,7.12746 7.88007, 2.57483, 6.77882, 5.15632, 8.32943):: cube) Order By: ((t_img0.sig):: text):: cube'(0.435404, 6.60287, 9.05022, 9.37975, 2.48392, 1.53466,0.363753, 4.07967,0.124681,3.61122,7.12746,7.88007,2.57483,6.7782,5.15632 8.32943)': cube) Buffers: shared hit=16032 Planning Time: 0.096 ms Execution Time: 148.905 ms (9 rows)
Imgsmlr
Postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_img0 order by sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460, 7.880070,2.574830,6.778820,5.156320,8.329430) 'limit 2 QUERY PLAN- - -Limit (cost=0.36..0.37 rows=2 width=72) (actual time=40.284..48.183 rows=2 loops=1) Output: id Sig, (sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460,7.880070,2.574830,6.778820,5.156320) 8.329430): signature)) Buffers: shared hit=2914-> Index Scan using t_img0_sig_idx on public.t_img0 (cost=0.36..7032.36 rows=754085 width=72) (actual time=40.282..48.179 rows=2 loops=1) Output: id, sig, (sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220,7.127460, 7.880070, 2.574830 6.778820, 5.156320, 8.329430):: signature) Order By: (t_img0.sig'(0.435404, 6.602870, 9.050220, 9.379750, 2.483920, 1.534660, 0.363753, 4.079670, 0.124681, 3.611220, 7.127460, 7.880070, 2.574830, 6.778820, 5.156320, 8.329430):: signature) Buffers: shared hit=2914 Planning Time: 0.091 ms Execution Time: 48.210 ms (9 rows)
The advantage of cube over imgsmlr is that cube can calculate vector similarity in any dimension, while imgsmlr is only used to calculate vector similarity in 16 dimensions (signation type).
At this point, the study on "what are the similar search plug-ins for PostgreSQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
Http://wenku.it168.com/d_001595079.shtml
© 2024 shulou.com SLNews company. All rights reserved.