In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, Xiaobian will bring you about how to intervene in search sorting in real time in PostgreSQL. The article is rich in content and analyzed and described from a professional perspective. After reading this article, I hope you can gain something.
PostgreSQL is a long-standing database, originally designed by Professor Michael Stonebraker at the University of California, Berkeley, with similar functionality, performance, architecture, and stability to Oracle.
Alibaba Cloud HybridDB for PostgreSQL provides massively parallel processing (MPP) data warehouse services, supports multi-core parallel computing, vector computing, graph computing, JSON,JSONB full-text retrieval.
3.2 system architecture
PostgreSQL's efficient parallel processing capabilities, data merging capabilities based on JSON format and Notify real-time messaging capabilities provide us with specific implementation ideas. So based on the business challenges mentioned above, we sorted out the implementation scenarios.
A careful analysis of the overall programme can be summarized as follows:
heterogeneous data source access
normalization service
Data consolidation strategy
real-time score recalculation
heterogeneous data source access
The data related to idle fish commodities is very rich, and there are various heterogeneous data sources, such as the full amount of offline commodity data, real-time commodity change data, various algorithm dimension data, etc., which can be processed uniformly through Alibaba Cloud big data platform and binlog monitoring tool.
As shown in the above figure, all heterogeneous data sources are input to the normalized service through asynchronous messages according to a unified format. The advantage of this scheme is that no matter the full amount of data or incremental data, the message service is unified, simplifying the access process, and decoupling is performed through the message middle layer to improve stability.
normalization service
The normalization service receives upstream heterogeneous data source messages, and transmits correct data to downstream through a data verification module, a data completion module, a standard format conversion module, and a data monitoring module.
Data verification module is shown in the figure below. Data source is verified at field level in combination with metadata center, such as field name, data type, data range, default value, etc. The biggest advantage of introducing metadata center is that it can control data source with fine granularity to prevent dirty data and unnecessary data from polluting downstream.
The data source of the data completion module usually needs to complete some data intervention indicators in real time, such as users editing commodities, real-time analysis of whether there is a yellow map marked, commodity price prediction, etc. The whole intervention process should be in the form of pipeline, exposing expansion points and allowing insertion of intervention capabilities.
Standard format conversion module Standard format conversion module converts data sources into JSON structure according to standard format, which is convenient for downstream unified data consolidation.
Data monitoring module The data monitoring module records each piece of data and abnormal data of the data source, and delivers the data to the monitoring system to monitor the abnormal data of each heterogeneous data source and the abnormal flow situation, and finds and recovers the problem at the first time.
Data consolidation strategy
The data merge strategy mainly includes two sequential processing processes: data merge and data change notification based on time stamps. In the data merge process, a core problem will be encountered, that is, how to quickly and effectively solve the conflict merge of each field, and unify merge based on time stamps. Here, we will first refer to the data storage structure. Refer to the following table to design the structure:
create table Test (id int8 primary key, --Product IDatt jsonb --commodity attributes);
Attribute design is JSON, JSON inside is K-V attribute pair, the following attribute structure example, V inside is an array, containing the value of K and the last update time of this pair of attributes, update time is used for merge update, update when the attribute changes, do not update when there is no change. Advantages of this design:
Field-level fine-grained merge ensures minimum set data real-time
High scalability, table does not need to add or delete fields
Attribute Structure Example
{"count": [100, "2017-01-01 10:10:00"], "price": [8880, "2018-01-04 10:10:12"], "newatt": [120, "2017-01-01 12:22:00"]}
After defining the storage structure, next use PostgreSQL's JSON processing capabilities for data merge, refer to the following merge udf pseudocode:
create or replace function merge_json(jsonb, jsonb) returns jsonb as $$ select jsonb_object_agg(key,value) from ( select coalesce(a.key, b.key) as key, case when coalesce(jsonb_array_element(a.value,1)::text::timestamp, '1970-01-01'::timestamp) > coalesce(jsonb_array_element(b.value,1)::text::timestamp, '1970-01-01'::timestamp) then a.value else b.value end from jsonb_each($1) a full outer join jsonb_each($2) b using (key) ) t;$$ language sql strict ;
After defining the merge method, we call it directly when there is a data change in the data source.
insert into a values(1, '{"price":[1000, "2019-01-01 10:10:12"], "newatt": ["hello", "2018-01-01"]}')on conflict (id)do update setatt = merge_json(a.att, excluded.att) wherea.att merge_json(a.att, excluded.att);
As can be seen from the above, when a commodity ID conflict occurs, merge_json will be called to merge the data. At this point, the data merge process is completed. Next, the merge result needs to be notified to the downstream in real time, which can be handled by PostgreSQL's trigger and Notify mechanism.
trigger design
//udfCREATE OR REPLACE FUNCTION notify1() returns trigger AS $function$declare begin perform pg_notify( 'a', --asynchronous message channel name format('CLASS:notify, ID:%s, ATT:%s', NEW.id, NEW.att) --Message content ); return null; end$function$ language plpgsql strict;//create trigger tg1 after insert or update on Test for each row execute procedure notify1();
It can be seen that when data is inserted or updated, trigger will execute nofity1 function to create asynchronous nofity message and send notification to specified channel. Downstream application can listen to corresponding channel through jdbc, receive message and perform subsequent real-time scoring process. Refer to the following pseudocode:
this.pgconn = conn.unwrap(org.postgresql.PGConnection.class);Statement stmt = conn.createStatement();stmt.execute("LISTEN a");stmt.close();org.postgresql.PGNotification notifications[] = pgconn.getNotifications();if (notifications != null) { for (int i=0; i < notifications.length; i++) { System.out.println("Got notification: " + notifications[i].getName()); }}
In addition, PostgreSQL concurrent processing performance is very efficient. Binding triggers will increase PostgreSQL's data writing time, but the pressure test results still meet our business writing performance requirements.
Take the 10 million data test results as an example:
Data Real-time Scoring Intervention Search
The service layer monitors the Notify message, parses the message data, recalculates the score of each index weight through the rule engine, calculates the comprehensive score, prints to the search tag table, the search engine monitors the tag table in real time, dumps the comprehensive score to the search engine, and interferes with the sorting results in real time.
The above is how to intervene in search sorting in real time in PostgreSQL shared by everyone. If you happen to have similar doubts, you may wish to refer to the above analysis for understanding. If you want to know more about it, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.