In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
PostgreSQL how to support rich NoSQL features, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
PostgreSQL is not only a relational database, but also supports rich NoSQL features, so the editor will introduce the NoSQL features of PostgreSQL.
I. JSON and JSONB data types
PostgreSQL supports non-relational data types json (JavaScript Object Notation). This section describes json types, differences between json and jsonb, json and jsonb operators and functions, and appends, deletions, and updates of jsonb key values.
1. Brief introduction of JSON types
PotgreSQL has provided json type as early as version 9.2, and with the evolution of large version, PostgreSQL's support for json tends to be improved, such as providing more json functions and operators to facilitate application development. A simple example of json type is as follows:
Mydb= > SELECT'{"a": 1, "b": 2}':: json; json-{"a": 1, "b": 2}
To better demonstrate the json type, next create a table as follows:
Mydb= > CREATE TABLE test_json1 (id serial primary key,name json); CREATE TABLE
In the above example, the field name is defined as json, and the table data is inserted as follows:
Mydb= > INSERT INTO test_json1 (name) VALUES ('{"col1": 1, "col2": "francs", "col3": "male"}'); INSERT 0 1 mydb= > INSERT INTO test_json1 (name) VALUES ('{"col1": 2, "col2": "fp", "col3": "female"}'); INSERT 0 1
Query table test_json1 data:
Mydb= > SELECT * FROM test_json1; id | name-+-- 1 | {"col1": 1, "col2": "francs", "col3": "male"} 2 | {"col1": 2, "col2": "fp", "col3": "female"}
2. Query JSON data
You can query the key values of json data through the-> operator, as shown below:
Mydb= > SELECT name-> 'col2' FROM test_json1 WHERE id=1;? column?-"francs" (1 row)
If you want to return the json field key value in text format, you can use the-> > character, as shown below:
Mydb= > SELECT name-> > 'col2' FROM test_json1 WHERE id=1; francs (1 row)
3. The difference between JSONB and JSON
PostgreSQL supports two JSON data types, json and jsonb, which are almost identical in use, with the main differences as follows:
The json storage format is text, while the jsonb storage format is binary. Due to the different storage formats, the processing efficiency of the two json data types is different. Json types store text and store the same content as input data. When retrieving json data, it must be reparsed, while jsonb stores parsed data in binary form, so json writes faster than jsonb. But the retrieval is slower than jsonb, and later tests will be conducted to verify the difference in read and write performance.
In addition to the differences described above, there are also differences in the use of json and jsonb. For example, the order of the keys output by jsonb is different from the input, as shown below:
Mydb= > SELECT'{"bar": "baz", "balance": 7.77, "active": false}':: jsonb; jsonb-{"bar": "baz", "active": false, "balance": 7.77} (1 row)
The order of json's output keys is exactly the same as the input, as follows:
Mydb= > SELECT'{"bar": "baz", "balance": 7.77, "active": false}':: json; json-{"bar": "baz", "balance": 7.77, "active": false} (1 row)
In addition, the jsonb type removes the spaces from the key values in the input data, as follows:
Mydb= > SELECT'{"id": 1, "name": "francs"}':: jsonb; jsonb-- {"id": 1, "name": "francs"} (1 row)
In the above example, the id key and name key are entered with spaces, and the output shows that the space bar is deleted, while the output of json is the same as the input, and the space bar is not deleted:
Mydb= > SELECT'{"id": 1, "name": "francs"}':: json; json-- {"id": 1, "name": "francs"} (1 row)
In addition, jsonb deletes the duplicate key, leaving only one * *, as shown below:
Mydb= > SELECT'{"id": 1, "name": "francs", "remark": "a good guy!", "name": "test"}':: jsonb Jsonb-{"id": 1, "name": "test", "remark": "a good guy!"} (1 row)
The above name key repeats, leaving only the value of * one name key, while the json data type retains duplicate key values.
Compared to most json application scenarios, jsonb is recommended, unless there are special requirements, such as special requirements for the key order of json.
4. JSONB and JSON operators
PostgreSQL supports rich JSONB and JSON operators, such as:
Field keys that return type json in text format can use the-> > character, as shown below:
Whether the string is used as the top-level key value, as follows:
Mydb= > SELECT'{"a": 1, "b": 2}':: jsonb? 'a row; t (1)
Delete the key / value of the json data, as follows:
Mydb= > SELECT'{"a": 1, "b": 2}':: jsonb-'asides; {"b": 2} (1 row)
5. JSONB and JSON functions
Json has a wealth of jsonb-related functions, such as the following:
Extend the outermost json object to a set of key / value result sets, as follows:
Mydb= > SELECT * FROM json_each ('{"a": "foo", "b": "bar"}'); key | value-+-a | "foo" b | "bar" (2 rows)
The result is returned as text, as follows:
Mydb= > SELECT * FROM json_each_text ('{"a": "foo", "b": "bar"}'); a | foo b | bar (2 rows)
A very important function is the row_to_json function, which returns rows as json objects, which is often used to generate json test data, such as converting a regular table to a json type table:
Mydb= > SELECT * FROM test_copy WHERE id=1; id | name-+-1 | a (1 row) mydb= > SELECT row_to_json (test_copy) FROM test_copy WHERE id=1; row_to_json-{"id": 1, "name": "a"} (1 row)
Returns a collection of keys in the outermost json object, as follows:
Mydb= > SELECT * FROM json_object_keys ('{"a": "foo", "b": "bar"}'); json_object_keys-a b (2 rows)
6. Append, delete and update jsonb key / value
The jsonb key / value can be appended through the | | operator. Add the sex key / value as follows:
Mydb= > SELECT'{"name": "francs", "age": "31"}':: jsonb |'{"sex": "male"}':: jsonb;? column?-{"age": "31", "sex": "male", "name": "francs"} (1 row)
There are two ways to delete the jsonb key / value, one is through the operation symbol-delete, and the other is through the operator #-delete the specified key / value.
By operating the symbol-delete the key / value as follows:
Mydb= > SELECT'{"name": "James", "email": "james@localhost"}': jsonb-'email';? column?-{"name": "James"} (1 row) mydb= > SELECT' ["red", "green", "blue"]':: jsonb-0; ["green", "blue"]
The second method is to delete the specified key / value through the operator # -, which is usually used in scenarios with nested json data deletion, such as deleting the fax key / value in the nested contact:
Mydb= > SELECT'{"name": "James", "contact": {"phone": "01234 567890", "fax": "01987 543210"}}':: jsonb # -'{contact,fax}':: text ? column?-{"name": "James", "contact": {"phone": "01234 567890"}} (1 row)
Delete the key / value at position 1 in the nested aliases, as follows:
Mydb= > SELECT'{"name": "James", "aliases": ["Jamie", "The Jamester", "J Man"]}':: jsonb # -'{aliases,1}':: text; {"name": "James", "aliases": ["Jamie", "J Man"]} (1 row)
There are also two ways to update keys / values. *: | operator, | | operator can connect the age key or overwrite duplicate key values. Modify the Johnson key as follows:
Mydb= > SELECT'{"name": "francs", "age": "31"}':: jsonb | |'{"age": "32"}':: jsonb;? column?-{"age": "32", "name": "francs"} (1 row)
The second way is through the jsonb_set function, and the syntax is as follows:
Jsonb_set (target jsonb, path text [], new_value jsonb [, create_missing boolean])
Target refers to the source jsonb data, path refers to the path, and new_value refers to the updated key value. A create_missing value of true means to add a key if it does not exist. A create_missing value of false means that a key is not added if it does not exist. An example is as follows:
Mydb= > SELECT jsonb_set ('{"name": "francs", "age": "31"}':: jsonb,' {age}','"32":: jsonb,false); jsonb_set mydb= > SELECT jsonb_set ('{"name": "francs", "age": "31"}': jsonb,' {sex}','"male":: jsonb,true)
7. Create an index for JSONB type
This section describes the creation of indexes for the jsonb data type, which supports GIN indexes. For ease of illustration, if a json field content is as follows and stored in jsonb format.
{"id": 1, "user_id": 1440933, "user_name": "1_francs", "create_time": "2017-08-03 16 purse 22buret 05.528432mm 08"}
If the field that stores the above jsonb data is named user_info and the table is named tbl_user_jsonb, the syntax for creating a GIN index on the user_info field is as follows:
CREATE INDEX idx_gin ON tbl_user_jsonb USING gin (user_info)
The GIN index on jsonb supports the @ >,?,? &,? | operators. For example, the following query will use the index:
SELECT * FROM tbl_user_jsonb WHERE user_info @ >'{"user_name": "1_frans"}'
However, the following query based on the jsonb key value does not walk away from the index idx_gin, as shown below:
SELECT * FROM tbl_user_jsonb WHERE user_info- > > 'user_name'=' 1percent France'
If you want to improve the efficiency of key value retrieval based on jsonb type, you can create an index on the key value corresponding to the jsonb data type, as follows:
CREATE INDEX idx_gin_user_infob_user_name ON tbl_user_jsonb USING btree ((user_info-> > 'user_name'))
After the above index is created, the SQL queried according to the user_info- > > 'user_name' key value above will walk the index.
2. Read and write performance test of JSON and JSONB
Earlier, we introduced the content related to the creation of jsonb data type index. In this section, we will briefly compare the read and write performance of json and jsonb. There is a difference in read and write performance between json and jsonb, mainly because json writes faster than jsonb, but retrieves slower than jsonb. The main reasons are:
The json storage format is text, while the jsonb storage format is binary. The difference in storage format makes the processing efficiency of the two json data types different. The content stored in the json type is the same as the input data, which must be reparsed when retrieving json data, while jsonb stores the parsed data in binary form, and there is no need to reparse when retrieving jsonb data.
1. Build JSON and JSONB test tables.
The following is a simple example to test the difference in read and write performance between json and jsonb. You plan to create the following three tables:
Quser_ini: basic data table and insert 2 million test data
Qtbl_user_json: json data type table, 2 million data
Qtbl_user_jsonb:jsonb data type table, 2 million data.
First, create the user_ini table and insert 2 million test data, as follows:
Mydb= > CREATE TABLE user_ini (id int4, user_id int8, user_name character varying (64), create_time timestamp (6) with time zone default clock_timestamp); CREATE TABLE mydb= > INSERT INTO user_ini (id,user_id,user_name) SELECT r field (random*2000000), r | |'_ francs' FROM generate_series (1Med 2000000) as r; INSERT 0 2000000
Plan to use user_ ini table data to generate json and jsonb data, and create user_ini_json and user_ini_jsonb tables, as shown below:
Mydb= > CREATE TABLE tbl_user_json (id serial, user_info json); CREATE TABLE mydb= > CREATE TABLE tbl_user_jsonb (id serial, user_info jsonb); CREATE TABLE
2. JSON and JSONB table writing performance test
According to the user_ini data, insert 2 million json data into the table user_ini_json through the row_to_json function, as follows:
Mydb= > iming Timing is on. Mydb= > INSERT INTO tbl_user_json (user_info) SELECT row_to_json (user_ini) FROM user_ini; INSERT 0 2000000 Time: 13825.974 ms
From the above results, we can see that it took about 13 seconds for tbl_user_json to insert 2 million data; then 2 million jsonb data was generated from user_ ini table data and inserted into table tbl_user_jsonb, as follows:
Mydb= > INSERT INTO tbl_user_jsonb (user_info) SELECT row_to_json (user_ini):: jsonb FROM user_ini; INSERT 0 2000000 Time: 20756.993 ms
From the above, we can see that it took about 20 seconds for the tbl_user_jsonb table to insert 2 million jsonb data, which verifies that json data is written faster than jsonb, and the space occupied by the two tables is compared, as shown below:
Mydb= > dt+ tbl_user_json List of relations Schema | Name | Type | Owner | Size | Description-+-pguser | tbl_user_json | table | pguser | 281 MB | (1 row) mydb= > dt+ tbl_user_jsonb -+-pguser | tbl_user_jsonb | table | pguser | 333 MB | (1 row)
In terms of footprint, the same amount of data jsonb data type takes up a little more space than json.
Query one of the test data in the tbl_user_json table as follows:
Mydb= > SELECT * FROM tbl_user_json LIMIT 1 Id | user_info-+-2000001 | {"id": 1, "user_id": 1182883, "user_name": "1_francs" "create_time": "2017-08-03T20:59:27.42741+08:00"} (1 row)
3. Table reading performance test of JSON and JSONB
For json and jsonb read performance tests, we choose a query based on the key value of json and jsonb. For example, query based on the value of the user_ name key in the user_info field, as follows:
Mydb= > EXPLAIN ANALYZE SELECT * FROM tbl_user_jsonb WHERE user_info- > > 'user_name'='1_francs' QUERY PLAN-Seq Scan on tbl_user_jsonb (cost=0.00..72859.90 rows=10042 width=143) (actual time=0.023..524. 843 rows=1 loops=1) Filter: (user_info-> > 'user_name'::text) =' 1_francs'::text) Rows Removed by Filter: 1999999 Planning time: 0.091 ms Execution time: 524.876 ms (5 rows)
The execution time of the above SQL is about 524ms. The btree index is created based on the user_name key value of the user_info field as follows:
Mydb= > CREATE INDEX idx_jsonb ON tbl_user_jsonb USING btree ((user_info- > > 'user_name'))
Execute the above query again, as follows:
Bitmap Heap Scan on tbl_user_jsonb (cost=155.93..14113.93 rows=10000 width=143) (actual time=0.027..0.027 rows=1 loops=1) Recheck Cond: (user_info-> > 'user_name'::text) =' 1_francs'::text) Heap Blocks: exact=1-> Bitmap Index Scan on idx_jsonb (cost=0.00..153.43 rows=10000 width=0) (actual time=0.021..0.021 rows=1 loops=1) Index Cond: (user_info-> >' User_name'::text) = '1_francs'::text) Planning time: 0.091 ms Execution time: 0.060 ms (7 rows)
According to the above execution plan, we can see that the index is gone, and the SQL time is reduced to 0.060ms. To better compare the efficiency of key-based queries in tbl_user_json and tbl_user_jsonb tables, it is planned to scan the range according to the user_info field id key to compare the performance. The index is created as follows:
Mydb= > CREATE INDEX idx_gin_user_info_id ON tbl_user_json USING btree (user_info-> > 'id'):: integer)); CREATE INDEX mydb= > CREATE INDEX idx_gin_user_infob_id ON tbl_user_jsonb USING btree
After the index is created, query the tbl_user_ Johnson table as follows:
Mydb= > EXPLAIN ANALYZE SELECT id,user_info- > 'id',user_info- >' user_name' FROM tbl_user_json WHERE (user_info- > > 'id'):: int4 > 1 AND (user_info- > >' id'):: int4 > 'id'::text)):: integer > 1) AND ((user_info- > >' id'::text)):: integer
< 10000)) Heap Blocks: exact=338 ->Bitmap Index Scan on idx_gin_user_info_id (cost=0.00..163.72 rows=10329 width=0) (actual time=1.110..1.110 rows=19996 loops= 1) Index Cond: (user_info-> > 'id'::text)):: integer > 1) AND (user_info-> >' id'::text)):: integer
< 10000)) Planning time: 0.094 ms Execution time: 27.092 ms (7 rows) 根据以上看出,查询表tbl_user_json的user_info字段id键值在1到10000范围内的记录走了索引,并且执行时间为27.092毫秒,接着测试tbl_user_jsonb表同样SQL的检索性能,如下所示: mydb=>EXPLAIN ANALYZE SELECT id,user_info- > 'id' User_info- > 'user_name' FROM tbl_user_jsonb Bitmap Heap Scan on tbl_user_jsonb (cost=158.93..14316.93 rows=10000 width=68) (actual time=1.140..8.116 rows=9998 loops=1) Heap Blocks: exact=393-> Bitmap Index Scan on idx_gin_user_infob_id (cost=0.00..156.43 rows=10000 width=0) (actual time=1.058..1.058 rows=18992 loops=1) Planning time: 0.104 ms Execution time: 8.656 ms (7 rows)
From the above, we can see that records with id key values in the user_info field of the query table tbl_user_jsonb in the range of 1 to 10000 are indexed and the execution time is 8.656 milliseconds. From this test, we can see that jsonb retrieval is more efficient than json.
As can be seen from the above two tests, it just validates the view that "json writes are faster than jsonb, but slower than jsonb". It is worth mentioning that if you need to retrieve through key/value, for example:
SELECT * FROM tbl_user_jsonb WHERE user_info @ >'{"user_name": "2_francs"}'
At this point, the execution plan is a full table scan, as follows:
Mydb= > EXPLAIN ANALYZE SELECT * FROM tbl_user_jsonb WHERE user_info @ >'{"user_name": "2_francs"}' QUERY PLAN-Seq Scan on tbl_user_jsonb (cost=0.00..67733.00 rows=2000 width=143) (actual time=0.018..582.207 Rows=1 loops=1) Filter: (user_info @ >'{"user_name": "2_francs"}':: jsonb) Rows Removed by Filter: 1999999 Planning time: 0.065 ms Execution time: 582.232 ms (5 rows)
As you can see from the above, the execution time is about 582ms. Create a gin index on the tbl_user_jsonb field user_info, as shown below:
Mydb= > CREATE INDEX idx_tbl_user_jsonb_user_Info ON tbl_user_jsonb USING gin (user_Info); CREATE INDEX
After the index is created, execute the following again, as follows:
Bitmap Heap Scan on tbl_user_jsonb (cost=37.50..3554.34 rows=2000 width=143) (actual time=0.079..0.080 rows=1 loops=1) Recheck Cond: (user_info @ >'{"user_name": "2_francs"}':: jsonb) Heap Blocks: exact=1-> Bitmap Index Scan on idx_tbl_user_jsonb_user_info (cost=0.00..37.00 rows=2000 width=0) (actual time=0.069..0.069 rows=1 loops=1) Index Cond: (user_info @ >'{"user_name": "2_francs"}':: jsonb) Planning time: 0.094 ms Execution time: 0.114 ms (7 rows)
From the above, we can see that the index is gone, and the execution time is reduced to 0.114 milliseconds.
This part tests the difference of read and write performance between json and jsonb data types, and verifies the view that json is faster than jsonb in writing, but slower than jsonb in retrieval.
Full-text search supports JSON and JSONB
Next, let's introduce a new feature of PostgreSQL 10: full-text search supports json and jsonb data types. This part is divided into two parts. The * * part briefly introduces PostgreSQL full-text search, and the second part demonstrates the full-text search support for json and jsonb data types.
1. A brief introduction to PostgreSQL full-text search
For most applications, full-text retrieval is rarely implemented in the database, generally using a separate full-text retrieval engine, such as SQL-based full-text retrieval engine Sphinx. PostgreSQL supports full-text search. For small-scale applications, if you do not want to build a special search engine, PostgreSQL's full-text search can also meet the needs.
If you do not use a special search engine, most of the searches need to be matched by database like operations. The main disadvantages of this retrieval method are:
Can not well support the index, usually need full table scan to retrieve data, when the amount of data is large, the retrieval performance is very low.
The sorting of search results is not provided, and the performance is more obvious when the amount of output data is very large.
PostgreSQL full-text retrieval can effectively solve this problem. PostgreSQL full-text retrieval is realized by the following two data types.
Tsvector
The tsvector full-text retrieval data type represents an optimized search-based document that converts a string of strings to the tsvector full-text retrieval data type, as follows:
Mydb= > SELECT 'Hello,cat,how are u? cat is recording':: tsvector; tsvector-'Hello,cat,how'' are' 'cat'' is' 'naming' 'upright'(1 row)
As you can see, the content of the string is separated into several paragraphs, but only type conversion is done by: tsvector without data standardization. For English full-text retrieval, data can be standardized through the function to_tsvector, as shown below:
Mydb= > SELECT to_tsvector ('english','Hello cat,'); to_tsvector -' cat':2 'hello':1 (1 row)
Tsquery
Tsquery represents a text query that stores words for search and supports Boolean operations &, |,! to convert a string to tsquery, as shown below:
Mydb= > SELECT 'hello&cat'::tsquery; tsquery -' hello' & 'cat' (1 row)
The above is only converted to tsquery type and is not standardized. Standardization can be performed using the to_tsquery function, as shown below:
Mydb= > SELECT to_tsquery ('hello&cat')
To_tsquery
An example of a full-text search is as follows to retrieve whether the string includes hello and cat characters. In this case, true is returned.
Mydb= > SELECT to_tsvector ('english','Hello cat,how are u') @ @ to_tsquery (' hello&cat')
Retrieves whether the string contains the characters hello and dog, which in this case returns false.
Mydb= > SELECT to_tsvector ('english','Hello cat,how are u') @ @ to_tsquery (' hello&dog'); f (1 row)
Interested readers can test other operators of tsquery, such as |,! Wait.
Note: the to_tsvector function with two parameters is used here. The format of the function to_tsvector with two parameters is as follows:
To_tsvector ([config regconfig,] document text). The to_tsvector function in this section specifies that the config parameter is english. If the config parameter is not specified, the configuration of the default_text_search_config parameter is used by default.
Examples of English full-text retrieval
Let's demonstrate an example of English full-text retrieval, create a test table and insert 2 million test data, as shown below:
Mydb= > CREATE TABLE test_search (id int4,name text); CREATE TABLE mydb= > INSERT INTO test_search (id,name) SELECT n, n |'_ francs' FROM generate_series (1Magne 2000000) n; INSERT 0 2000000
Execute the following SQL to query for records with the character 1 _ francs in the test_ search table name field.
Mydb= > SELECT * FROM test_search WHERE name LIKE '1Francsgiving; id | name-+-1 | 1_francs (1 row)
The implementation plan is as follows:
Mydb= > EXPLAIN ANALYZE SELECT * FROM test_search WHERE name LIKE '1France' QUERY PLAN-Seq Scan on test_search (cost=0.00..38465.04 rows=204 width=18) (actual time=0.022..261.766 rows=1 loops=1 ) Filter: (name ~ '1_francs'::text) Rows Removed by Filter: 1999999 Planning time: 0.101 ms Execution time: 261.796 ms (5 rows)
The above execution plan takes a full table scan, the execution time is about 261 milliseconds, and the performance is very low, and then the index is created, as shown below:
Mydb= > CREATE INDEX idx_gin_search ON test_search USING gin (to_tsvector ('english',name)); mydb= > SELECT * FROM test_search WHERE to_tsvector (' english',name) @ @ to_tsquery ('english','1_francs')
Review the execution plan and execution time again, as follows:
Mydb= > EXPLAIN ANALYZE SELECT * FROM test_search WHERE to_tsvector ('english',name) @ @ Bitmap Heap Scan on test_search (cost=18.39..128.38 rows=50 width=36) (actual time=0.071..0.071 rows=1 loops=1) Recheck Cond: (to_tsvector (' english'::regconfig) Name) @ @ '1mm' & 'franc'''::tsquery) Heap Blocks: exact=1-> Bitmap Index Scan on idx_gin_search (cost=0.00..18.38 rows=50 width=0) (actual time=0.064..0.064 rows=1 loops=1) Index Cond: (to_tsvector (' english'::regconfig, name) @''franc'''::tsquery') Planning time: 0.122 ms Execution time: 0.104 ms (7 rows)
After creating the index, the above query walked out of the index and the execution time dropped to 0.104 milliseconds, and the performance improved by 3 orders of magnitude. It is worth mentioning that if the SQL is changed to the following, the index is not moved, as shown below:
Mydb= > EXPLAIN ANALYZE SELECT * FROM test_search WHERE to_tsvector (name) @ @ to_tsquery; Seq Scan on test_search (cost=0.00..1037730.00 rows=50 width=18) (actual time=0.036..10297.764 rows=1 loops=1) Filter: (to_tsvector (name) @ @ to_tsquery ('1_francs'::text)) Rows Removed by Filter: 1999999 Planning time: 0.098 ms Execution time: 10297.787 ms (5 rows)
Because the index is created using the to_tsvector ('english',name) function index with two parameters, the to_tsvector function in the where condition takes two parameters to walk the index, while to_tsvector (name) does not.
2. Practice of JSON and JSONB full-text retrieval.
Before PostgreSQL 10, full-text search did not support json and jsonb data types, and an important feature of version 10 is that full-text retrieval supports json and jsonb data types.
The difference of to_tsvector function between version 10 and version 9.6
Let's first take a look at the 9.6 version of the to_tsvector function, as follows:
[postgres@pghost1] $psql francs francs psql (9.6.3) Type "help" for help. Mydb= > df * to_tsvector* List of functions Schema | Name | Result data type | Argument data types | Type-+-pg_catalog | array_to_tsvector | tsvector | Text | normal pg_catalog | to_tsvector | tsvector | regconfig Text | normal pg_catalog | to_tsvector | tsvector | text | normal (3 rows)
From the above, we can see that the input parameters of version 9.6 to_tsvector function only support text and text data types, and then take a look at version 10 of to_tsvector function, as shown below:
[postgres@pghost1 ~] $psql mydb pguser psql (10.0) pg_catalog | to_tsvector | tsvector | json | normal pg_catalog | tsvector | jsonb | normal pg_catalog | to_tsvector | tsvector | regconfig, json | normal pg_catalog | to_tsvector | tsvector | regconfig, jsonb | normal
As you can see from the above, json and jsonb have been added to the data types supported by the version 10 to_tsvector function.
Create a data generation function
To facilitate the generation of test data, create the following two functions to randomly generate strings of specified length. The random_range (int4, int4) function is created as follows:
CREATE OR REPLACE FUNCTION random_range (int4, int4) RETURNS int4 LANGUAGE SQL AS $$SELECT ($1 + FLOOR (($2-$1 + 1) * random)):: int4; $$
Then create the random_text_simple (length int4) function, which calls the random_range (int4, int4) function.
CREATE OR REPLACE FUNCTION random_text_simple (length int4) RETURNS text LANGUAGE PLPGSQL AS $$DECLARE possible_chars text: = '0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ; output text: ='; I int4; pos int4; BEGIN FOR i IN 1..length LOOP pos: = random_range (1, length (possible_chars)); output: = output | | substr (possible_chars, pos, 1); END LOOP; RETURN output; END; $
The random_text_simple (length int4) function can randomly generate a string of specified length, as follows:
Mydb= > SELECT random_text_simple (3); random_text_simple-LL9 (1 row)
Randomly generate a string of six characters, as follows:
Mydb= > SELECT random_text_simple (6); B81BPW (1 row)
This function will be used later to generate test data.
Create a JSON test table
Create a user_ini test table and insert 1 million randomly generated six-character string test data through the random_text_simple (length int4) function, as follows:
Mydb= > CREATE TABLE user_ini (id int4, user_id int8, user_name character varying (64), create_time timestamp (6) with time zone default clock_timestamp); SELECT r direction round (random*1000000), random_text_simple (6) FROM generate_series (1m 1000000) as r; INSERT 0 1000000
Create the tbl_user_search_ Johnson table and convert the table user_ini row data to json data through the row_to_json function, as follows:
Mydb= > CREATE TABLE tbl_user_search_json (id serial, user_info json); CREATE TABLE mydb= > INSERT INTO tbl_user_search_json (user_info) SELECT row_to_json (user_ini) FROM user_ini; INSERT 0 1000000
The generated data is as follows:
Mydb= > SELECT * FROM tbl_user_search_json LIMIT 1 Id | user_info-+-1 | {"id": 1, "user_id": 186536 "user_name": "KTU89H", "create_time": "2017-08-05T15:59:25.359148+08:00"} (1 row)
JSON data full text Retrieval Test
Use full-text search to retrieve records that contain KTU89H characters in the user_info field of the query table tbl_user_search_json, as follows:
Mydb= > SELECT * FROM tbl_user_search_json WHERE to_tsvector ('english',user_info) @ @ to_tsquery (' ENGLISH','KTU89H') Id | user_info-+-
The above SQL can perform normally. Full-text search supports json data type, but the performance of the above-mentioned SQL is poor due to full table scanning, and the execution time is 8061 milliseconds, as shown below:
Mydb= > EXPLAIN ANALYZE SELECT * FROM tbl_user_search_json-Seq Scan on tbl_user_search_json (cost=0.00..279513.00 rows=5000 width=104) (actual time=0.046..8061.858 rows=1 loops=1) Filter: (to_tsvector ('english'::regconfig) User_info) @ @''ktu89h'''::tsquery) Rows Removed by Filter: 999999 Planning time: 0.091 ms Execution time: 8061.880 ms (5 rows)
Create the following index:
Mydb= > CREATE INDEX idx_gin_search_json ON tbl_user_search_json USING gin (to_tsvector ('english',user_info)); CREATE INDEX
After the index is created, execute the following SQL again, as shown below:
Mydb= > EXPLAIN ANALYZE SELECT * FROM tbl_user_search_json WHERE to_tsvector ('english',user_info) @ @ to_tsquery (' ENGLISH','KTU89H') Bitmap Heap Scan on tbl_user_search_json (cost=50.75..7876.06 rows=5000 width=104) (actual time=0.024..0.024 rows=1 loops=1) Recheck Cond: (to_tsvector ('english'::regconfig User_info) @ @''ktu89h'''::tsquery) Heap Blocks: exact=1-> Bitmap Index Scan on idx_gin_search_json (cost=0.00..49.50 rows=5000 width=0) (actual time=0.018..0.018 rows=1 loops=1) Index Cond: (to_tsvector (' english'::regconfig, user_info) @''ktu89h'''::tsquery) Planning time: 0.113 ms Execution time: 0.057 ms (7 rows)
From the above execution plan, we can see that the index is gone, and the execution time is reduced to 0.057 milliseconds, the performance is very good.
The first part of this section briefly introduces the implementation of PostgreSQL full-text retrieval, and gives an example of English retrieval. The latter part introduces a new feature of PostgreSQL10 through examples, that is, full-text retrieval supports json and jsonb types.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.