In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces how to realize the inheritance and partition of tables in PostgreSQL. It is very detailed and has a certain reference value. Friends who are interested must read it!
1. Inheritance of tables:
This concept will be somewhat unfamiliar to many developers who are already familiar with other database programming, but its implementation and design principles are easy to understand, so let's start with a simple example.
1. First inheritance table:
CREATE TABLE cities (--parent table
Name text
Population float
Altitude int
);
CREATE TABLE capitals (--child table
State char (2)
) INHERITS (cities)
The capitals table inherits all the properties of the cities table. In PostgreSQL, a table can inherit properties from zero or more other tables, and a query can reference either all rows in the parent table or all rows in the parent table plus the rows of all its child tables, where the latter is the default behavior.
MyTest=# INSERT INTO cities values ('Las Vegas', 1.53,2174);-- insert parent table
INSERT 0 1
MyTest=# INSERT INTO cities values ('Mariposa',3.30,1953);-- insert parent table
INSERT 0 1
MyTest=# INSERT INTO capitals values ('Madison',4.34,845,'WI');-- insert child table
INSERT 0 1
MyTest=# SELECT name, altitude FROM cities WHERE altitude > 500;-- data from both parent and child tables are fetched.
Name | altitude
-+-
Las Vegas | 2174
Mariposa | 1953
Madison | 845
(3 rows)
MyTest=# SELECT name, altitude FROM capitals WHERE altitude > 500;-- only the data from the child table is fetched.
Name | altitude
-+-
Madison | 845
(1 row)
If you want to extract data only from the parent table, you need to add the ONLY keyword to the SQL, such as:
MyTest=# SELECT name,altitude FROM ONLY cities WHERE altitude > 500
Name | altitude
-+-
Las Vegas | 2174
Mariposa | 1953
(2 rows)
The "ONLY" keyword in front of cities in the above example indicates that the query should only look for cities and not include tables whose inheritance level is lower than cities. Many of the commands we have discussed-- SELECT,UPDATE and DELETE-- support this "ONLY" symbol.
When performing the whole table data deletion, if you directly truncate the parent table, the data of the parent table and all its child tables will be deleted. If it is only the truncate child table, then the data of the parent table will not change, but the data in the child table will be emptied.
MyTest=# TRUNCATE TABLE cities;-data from both parent and child tables are deleted.
TRUNCATE TABLE
MyTest=# SELECT * FROM capitals
Name | population | altitude | state
-+-
(0 rows)
two。 Determine the source of the data:
Sometimes you may want to know which table a record comes from. In each table we have a system implicit field tableoid, which tells you the source of the table:
MyTest=# SELECT tableoid, name, altitude FROM cities WHERE altitude > 500
Tableoid | name | altitude
-+-
16532 | Las Vegas | 2174
16532 | Mariposa | 1953
16538 | Madison | 845
(3 rows)
The above results only give the tableoid, and just by this value, we still can't see the actual table name. To do this, we need to associate with the system table pg_class to extract the actual table name from that table through the tableoid field, as shown in the following query:
MyTest=# SELECT p.relname, c.name, c.altitude FROM cities c pgfight class p WHERE c.altitude > 500and c.tableoid = p.oid
Relname | name | altitude
-+-
Cities | Las Vegas | 2174
Cities | Mariposa | 1953
Capitals | Madison | 845
(3 rows)
3. Considerations for data insertion:
Inheritance does not automatically populate data from INSERT or COPY to other tables in the inheritance level. In our example, the following INSERT statement will not succeed:
INSERT INTO cities (name, population, altitude, state) VALUES ('New York', NULL, NULL,' NY')
We might want the data to be passed into the capitals table, but that's not going to happen: INSERT always inserts the explicitly declared table.
4. Multi-table inheritance:
A table can inherit from multiple parent tables, in which case it has the sum of the fields of the parent tables. Any fields defined in the child table are also added. If the same field name appears in multiple parent tables, or in both parent and child table definitions, these fields are "merged" so that there is only one such field in the child table. To merge, the field must be of the same data type, or an error will be thrown. The merged field will have all the constraints of the field it inherits.
CREATE TABLE parent1 (FirstCol integer)
CREATE TABLE parent2 (FirstCol integer, SecondCol varchar (20)
CREATE TABLE parent3 (FirstCol varchar)
-- the child table child1 inherits from both the parent1 and parent2 tables, and both parent tables contain FirstCol fields of type integer, so the child1 can be created successfully.
CREATE TABLE child1 (MyCol timestamp) INHERITS (parent1,parent2)
The child table child2 will not be created successfully because both of its parent tables contain FirstCol fields, but they are of different types.
CREATE TABLE child2 (MyCol timestamp) INHERITS (parent1,parent3)
The child table child3 will also not be created successfully because both it and its parent table contain FirstCol fields, but they are of different types.
CREATE TABLE child3 (FirstCol varchar (20)) INHERITS (parent1)
5. Inheritance and permissions:
Table access is not automatically inherited. Therefore, a user trying to access the parent table must also have access to all of its child tables, or use the ONLY keyword to extract data only from the parent table. When you add a new child table to an existing inheritance hierarchy, be careful to give it all permissions.
A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints are applied only to individual tables, not their inherited child tables. This is true for both the referenced table and the referenced table, so in the above example, if we declare cities.name as UNIQUE or a PRIMARY KEY, it does not prevent the capitals table from having cities data rows with duplicate names. And these duplicate rows are displayed by default when querying the cities table. In fact, by default, capitals will have no unique constraints at all, so it may contain multiple rows with the same name. You should add unique constraints to capitals, but doing so will not avoid repetition with cities. Similarly, if we declare some other table in cities.name REFERENCES, this constraint is not automatically broadcast to capitals. In this case, you can do this by manually adding the same REFERENCES constraint to capitals.
2. Partition table:
1. An overview of the partition table:
Partitioning means dividing a large logical table into physical chunks, and partitioning can provide several benefits:
1)。 The performance of some types of queries can be greatly improved.
2)。 The performance of updates can also be improved because the index of each block of the table is smaller than the index on the entire dataset. If the index cannot be all in memory, then both reading and writing on the index will result in more disk access.
3)。 Bulk deletions can be achieved by simply deleting a partition.
4)。 Rarely used data can be moved to cheaper, slower storage media.
Assuming that the current database does not support partitioned tables, and the amount of data that our application needs to process is also very large, for this application scenario, we have to manually split the large table into multiple small tables according to certain rules, so that each small table contains different intervals of data. In this way, we must calculate the small tables that the instruction needs to operate before inserting, updating, deleting, and querying the data. For some queries, because the query interval may span multiple small tables, we have to union the query results of multiple small tables to merge the data from multiple tables and finally form a result set to return to the client. It can be seen that if the database we are using does not support partitioned tables, then in a scenario suitable for its application, we need to do a lot of additional programming work to make up for this deficiency. It is important to note, however, that although the functionality is barely manageable, the performance is not comparable to that of the partitioned table.
Currently, PostgreSQL supports the following two main types of partitions:
1)。 Range partitioning: the table is partitioned into "ranges" by one or more key fields, and no overlapping values between these ranges are distributed to different partitions. For example, we can partition specific business objects by data range, or by identifier range.
2)。 List partitions: tables are achieved by explicitly listing which key values should appear in each partition.
two。 Implement partitioning:
1)。 Create the Master Table from which all partitions inherit.
CREATE TABLE measurement (--main table
City_id int NOT NULL
Logdate date NOT NULL
Peaktemp int
);
2)。 Create several "child" tables, each inheriting from the main table. Typically, these "child" tables will not add any more fields. We will call child tables partitions, even though they are normal PostgreSQL tables.
CREATE TABLE measurement_yy04mm02 () INHERITS (measurement)
CREATE TABLE measurement_yy04mm03 () INHERITS (measurement)
...
CREATE TABLE measurement_yy05mm11 () INHERITS (measurement)
CREATE TABLE measurement_yy05mm12 () INHERITS (measurement)
CREATE TABLE measurement_yy06mm01 () INHERITS (measurement)
The sub-tables created above are divided into scope in the form of years and months, and the data of different years and months will be assigned to different sub-tables. This implementation will be extremely convenient and efficient for emptying the partitioned data, that is, directly executing the DROP TABLE statement to delete the corresponding child table, and then considering whether to rebuild the child table (partition) according to the actual application. PostgreSQL also provides a more convenient way to manage child tables than direct DROP child tables:
ALTER TABLE measurement_yy06mm01 NO INHERIT measurement
Compared with direct DROP, this method only makes the child table separate from the original main table, and the data stored in the child table can still be accessed, because the table has been restored to a normal data table. In this way, for the DBA of the database, you can perform the necessary maintenance operations on the table at this time, such as data cleaning, archiving, and so on. After completing many routine operations, you can consider whether to delete the table directly (DROP TABLE) or empty the table's data (TRUNCATE TABLE) first, and then let the table inherit the main table again, such as:
ALTER TABLE measurement_yy06mm01 INHERIT measurement
3)。 Add constraints to the partition table to define the health values allowed for each partition. At the same time, it is important to note that the defined constraints ensure that there are not the same key values in different partitions. Therefore, we need to change the definition of the "child" table above to the following form:
CREATE TABLE measurement_yy04mm02 (
CHECK (logdate > = DATE '2004-02-01' AND logdate
< DATE '2004-03-01') ) INHERITS (measurement); CREATE TABLE measurement_yy04mm03 ( CHECK (logdate >= DATE '2004-03-01' AND logdate
< DATE '2004-04-01') ) INHERITS (measurement); ... CREATE TABLE measurement_yy05mm11 ( CHECK (logdate >= DATE '2005-11-01' AND logdate
< DATE '2005-12-01') ) INHERITS (measurement); CREATE TABLE measurement_yy05mm12 ( CHECK (logdate >= DATE '2005-12-01' AND logdate
< DATE '2006-01-01') ) INHERITS (measurement); CREATE TABLE measurement_yy06mm01 ( CHECK (logdate >= DATE '2006-01-01' AND logdate
< DATE '2006-02-01') ) INHERITS (measurement); 4). 尽可能基于键值创建索引。如果需要,我们也同样可以为子表中的其它字段创建索引。 CREATE INDEX measurement_yy04mm02_logdate ON measurement_yy04mm02 (logdate); CREATE INDEX measurement_yy04mm03_logdate ON measurement_yy04mm03 (logdate); ... CREATE INDEX measurement_yy05mm11_logdate ON measurement_yy05mm11 (logdate); CREATE INDEX measurement_yy05mm12_logdate ON measurement_yy05mm12 (logdate); CREATE INDEX measurement_yy06mm01_logdate ON measurement_yy06mm01 (logdate); 5). 定义一个规则或者触发器,把对主表的修改重定向到适当的分区表。 如果数据只进入最新的分区,我们可以设置一个非常简单的规则来插入数据。我们必须每个月都重新定义这个规则,即修改重定向插入的子表名,这样它总是指向当前分区。 CREATE OR REPLACE RULE measurement_current_partition AS ON INSERT TO measurement DO INSTEAD INSERT INTO measurement_yy06mm01 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp); 其中NEW是关键字,表示新数据字段的集合。这里可以通过点(.)操作符来获取集合中的每一个字段。 我们可能想插入数据并且想让服务器自动定位应该向哪个分区插入数据。我们可以用像下面这样的更复杂的规则集来实现这个目标。 CREATE RULE measurement_insert_yy04mm02 AS ON INSERT TO measurement WHERE (logdate >= DATE '2004-02-01' AND logdate
< DATE '2004-03-01') DO INSTEAD INSERT INTO measurement_yy04mm02 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp); ... CREATE RULE measurement_insert_yy05mm12 AS ON INSERT TO measurement WHERE (logdate >= DATE '2005-12-01' AND logdate
< DATE '2006-01-01') DO INSTEAD INSERT INTO measurement_yy05mm12 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp); CREATE RULE measurement_insert_yy06mm01 AS ON INSERT TO measurement WHERE (logdate >= DATE '2006-01-01' AND logdate
< DATE '2006-02-01') DO INSTEAD INSERT INTO measurement_yy06mm01 VALUES (NEW.city_id, NEW.logdate, NEW.peaktemp); 请注意每个规则里面的WHERE子句正好匹配其分区的CHECK约束。 可以看出,一个复杂的分区方案可能要求相当多的DDL。在上面的例子里我们需要每个月创建一次新分区,因此写一个脚本自动生成需要的DDL是明智的。除此之外,我们还不难推断出,分区表对于新数据的批量插入操作有一定的抑制,这一点在Oracle中也同样如此。 除了上面介绍的通过Rule的方式重定向主表的数据到各个子表,我们还可以通过触发器的方式来完成此操作,相比于基于Rule的重定向方法,基于触发器的方式可能会带来更好的插入效率,特别是针对非批量插入的情况。然而对于批量插入而言,由于Rule的额外开销是基于表的,而不是基于行的,因此效果会好于触发器方式。另一个需要注意的是,copy操作将会忽略Rules,如果我们想要通过COPY方法来插入数据,你只能将数据直接copy到正确的子表,而不是主表。这种限制对于触发器来说是不会造成任何问题的。基于Rule的重定向方式还存在另外一个问题,就是当插入的数据不在任何子表的约束中时,PostgreSQL也不会报错,而是将数据直接保留在主表中。 6). 添加新分区: 这里将介绍两种添加新分区的方式,第一种方法简单且直观,我们只是创建新的子表,同时为其定义新的检查约束,如: CREATE TABLE measurement_y2008m02 ( CHECK ( logdate >= DATE '2008-02-01' AND logdate
< DATE '2008-03-01' ) ) INHERITS (measurement); 第二种方法的创建步骤相对繁琐,但更为灵活和实用。见以下四步: /* 创建一个独立的数据表(measurement_y2008m02),该表在创建时以将来的主表(measurement)为模板,包含模板表的缺省值(DEFAULTS)和一致性约束(CONSTRAINTS)。*/ CREATE TABLE measurement_y2008m02 (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS); /* 为该表创建未来作为子表时需要使用的检查约束。*/ ALTER TABLE measurement_y2008m02 ADD CONSTRAINT y2008m02 CHECK (logdate >= DATE '2008-02-01' AND logdate
< DATE '2008-03-01'); /* 导入数据到该表。下面只是给出一种导入数据的方式作为例子。在导入数据之后,如有可能,还可以做进一步的数据处理,如数据转换、过滤等。*/ \copy measurement_y2008m02 from 'measurement_y2008m02' /* 在适当的时候,或者说在需要的时候,让该表继承主表。*/ ALTER TABLE measurement_y2008m02 INHERIT measurement; 7). 确保postgresql.conf里的配置参数constraint_exclusion是打开的。没有这个参数,查询不会按照需要进行优化。这里我们需要做的是确保该选项在配置文件中没有被注释掉。 />Pwd
/ opt/PostgreSQL/9.1/data
/ > cat postgresql.conf | grep "constraint_exclusion"
Constraint_exclusion = partition # on, off, or partition
3. Partition and constraint exclusion:
Constraint exclusion (Constraint exclusion) is a query optimization technique that improves the performance of table partitions defined by the above method. For example:
SET constraint_exclusion = on
SELECT count (*) FROM measurement WHERE logdate > = DATE '2006-01-01'
Without constraint exclusion, the above query scans every partition in the measurement table. When constraint exclusion is turned on, the planner checks the constraints of each partition and then the view proves that the partition does not need to be scanned because it cannot contain any rows of data that meet the criteria of the WHERE clause. If the planner can prove this, it excludes the partition from the query plan.
You can use the EXPLAIN command to show the difference between a plan when constraint_exclusion is on and off. The typical default plan for the table set up with the above method is:
SET constraint_exclusion = off
EXPLAIN SELECT count (*) FROM measurement WHERE logdate > = DATE '2006-01-01'
QUERY PLAN
-
Aggregate (cost=158.66..158.68 rows=1 width=0)
-> Append (cost=0.00..151.88 rows=2715 width=0)
-> Seq Scan on measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate > = '2006-01-01'::date)
-> Seq Scan on measurement_yy04mm02 measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate > = '2006-01-01'::date)
-> Seq Scan on measurement_yy04mm03 measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate > = '2006-01-01'::date)
...
-> Seq Scan on measurement_yy05mm12 measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate > = '2006-01-01'::date)
-> Seq Scan on measurement_yy06mm01 measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate > = '2006-01-01'::date)
As you can see from the query plan above, PostgreSQL scanned all partitions. Let's take another look at the query plan after turning on constraint exclusion:
SET constraint_exclusion = on
EXPLAIN SELECT count (*) FROM measurement WHERE logdate > = DATE '2006-01-01'
QUERY PLAN
-
Aggregate (cost=63.47..63.48 rows=1 width=0)
-> Append (cost=0.00..60.75 rows=1086 width=0)
-> Seq Scan on measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate > = '2006-01-01'::date)
-> Seq Scan on measurement_yy06mm01 measurement (cost=0.00..30.38 rows=543 width=0)
Filter: (logdate > = '2006-01-01'::date)
Note that constraint exclusion is only driven by CHECK constraints, not by indexes.
The default value for this configuration in the current version of PostgreSQL is partition, which is a behavior between on and off, that is, the planner will only apply constraint exclusion to queries based on partitioned tables, while the on setting will exclude constraints for all queries, so ordinary data tables will have to bear the additional overhead generated by this mechanism.
When using constraint exclusion, there are the following considerations:
1)。 Constraint exclusion takes effect only if the WHERE clause of the query contains constraints. A parameterized query is not optimized because the planner does not know which partition the parameter will select at run time. So functions like CURRENT_DATE must be avoided. Joining the partition key value to the field of another table will not be optimized.
2)。 Cross-type comparisons should be avoided in CHECK constraints, because the current planner will not be able to prove that such conditions are false. For example, the following constraint is available when x is an entire field, but not when x is a bigint:
CHECK (x = 1)
For the bigint field, we must use a constraint like this:
CHECK (x = 1::bigint)
This problem is not limited to the bigint data type; it can occur in situations where the default data type of any constraint does not match the data type of the field it compares. Cross-type comparisons in submitted queries are usually OK, but not in CHECK conditions.
3)。 The UPDATE and DELETE commands on the master table do not perform constraint exclusion.
4)。 When the planner does constraint exclusion, all constraints for all partitions on the main table will be checked, so a large number of partitions will significantly increase the time for query planning.
5)。 When you execute the ANALYZE statement, execute the command for each partition, not just the main table.
The above is all the contents of the article "how to realize the inheritance and partitioning of tables in PostgreSQL". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.