In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail the example analysis of mixed query of vector and structured data based on Milvus. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.
Implementation of mixed query of Vector and structured data based on Milvus I. Overview
Through the in-depth learning neural network model, unstructured data such as pictures, video, voice and text can be converted into feature vectors. In addition to structured vectors, these data often need to add other attributes. Such as face pictures, you can add tags such as gender, whether to wear glasses, picture capture time and other tags; text can add language type, corpus classification, text creation time and other tags.
In the past, people usually store feature vectors in structured label attribute sheets. However, the traditional database can not effectively search for massive and high-dimensional feature vectors. At this time, a feature vector database is needed to store and retrieve feature vectors efficiently.
II. Solutions
Milvus is a vector search engine, which can easily achieve high-performance retrieval for mass vectors. Combined with the traditional relational database such as Postgres, it is used to store the unique identification of Milvus vector and the corresponding attributes of ID and vector. The mixed query results can be quickly obtained by retrieving the results of Milvus vector and further querying in Postgres. The specific solutions are as follows:
1. Eigenvector storage
As shown above, the solid blue line represents the eigenvector stored procedure of the Milvus hybrid query. First of all, the source feature vector data is stored in the Milvus feature vector database, and Milvus will return the corresponding ID of each source vector data. Then the unique identification ID of each feature vector and its label attributes are stored in a relational database, such as Postgres, so as to complete the storage of feature vectors and label attributes.
2. Feature vector retrieval
Such as the orange solid line above, which represents the feature vector retrieval process of Milvus hybrid query. When the feature vector data that needs to be queried is passed into Milvus, Milvus will get the query result ID with the highest similarity with the search vector, and then use the result ID to query in Postgres, and finally get the mixed query result of the retrieval vector.
3. Milvus mixed query
At this point, you may wonder, why not just store the feature vectors and corresponding attributes in a relational database? Next, we will use Milvus to test 100 million data in ANN_SIFT1B, to answer for you, refer to the link [1].
1. Feature vector data set
The feature vector of the Mivus hybrid query is extracted from 100 million data (128D) in the Base Set file in ANN_SIFT1B. The feature vector is extracted from Query set when retrieving. It is assumed that the ANN_SIFT1B data set is a face feature vector, and each vector is tagged with gender, glasses, and image capture time.
# extract 100 million data from Base Set file to import Milvusvectors = load_bvecs_data (FILE_PATH,10000000)
# randomly generate gender, glasses and image capture time tags for vectors: sex = random.choice (['female','male']) get_time = fake.past_datetime (start_date= "- 120d", tzinfo=None) is_glasses = random.choice ([' True','False'])
2. Eigenvector storage
When 100 million data is imported into Milvus, the returned ids is the only ID corresponding to the vector, and the tags of ids and vector are stored in Postgres. Of course, the original feature vector can also be stored in Postgres (optional):
# Import 100 million raw data into Milvusstatus, ids = milvus.add_vectors (table_name=MILVUS_TABLE, records=vectors)
# store the tags of ids and vector in Postgressql = "INSERT INTO PG_TABLE_NAME VALUES (ids, sex, get_time, is_glasses);" cur.execute (sql)
3. Feature vector retrieval
Pass in the vector to be searched into the Milvus. Set TOP_K=10 and DISTANCE_THRESHOLD=1 (which can be modified according to requirements). TOP_K represents the top 10 results with the highest similarity to the query vector, and DISTANCE_THRESHOLD represents the distance threshold between the query vector and the search result vector.
Euclidean distance calculation is used in ANN_SIFT1B. After the parameters are set, Milvus will return the ids of the query results, use the ids to query in Postgres, and finally mix the query results.
# extract the vector to be searched from Query set according to query_location vector = load_query_list (QUERY_PATH,query_location) # pass the vector vectorsstatus to Milvus, results = milvus.search_vectors (table_name = MILVUS_TABLE,query_records=vector, top_k=top_k)
# query in Postgres using the results.ids returned by Milvus sql = "select * from PG_TABLE_NAME where ids = results.ids;" cur.execute (sql)
In the hybrid query of 100 million data, the Milvus feature vector search only needs 70ms, and the query of the ids of Milvus search results in Postgres does not exceed 7ms.
Generally speaking, the hybrid query of vector and structured data can be realized quickly by using Milvus feature vector database. If we only use the traditional relational database for vector query, it is not only difficult to store large-scale vector data, but also unable to complete feature vector retrieval with high performance.
Milvus feature vector search time Postgres search ids time 70 ms1 ms ~ 7 ms
A hybrid query based on Milvus is implemented here. In the case of 100 million eigenvector data sets, the hybrid query time is no more than 77 ms.
And based on the easy-to-manage and easy-to-use characteristics of Milvus, by referring to the tools provided by Milvus hybrid query scheme [2], we can easily achieve mixed query of vector and structured data, and better support business requirements.
So much for the example analysis of mixed query of vector and structured data based on Milvus. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.