Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Table inheritance to realize Partition Table in PostgreSQL

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to use table inheritance to achieve partition tables in PostgreSQL. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have some understanding of the relevant knowledge after reading this article.

Overview

Partitioning refers to dividing a logically large table into multiple small physical slices (subtables). Partitioning provides the following benefits:

. Query performance can be significantly improved in some cases, especially when rows with high access pressure are in one partition or a few partitions. Partitioning can replace the dominant column of the index, reduce the size of the index, and make the parts of the index with high access pressure more likely to be placed in memory.

. When querying or updating to access most of the rows of a partition, indexes and random access scattered across the entire table can be replaced by a sequential scan on that partition, which can improve performance.

. If the requirements plan uses a partition design, batch loading and deletion can be done by adding or removing partitions. Both ALTER TABLE NO INHERIT and DROP TABLE are much faster than a batch operation. These commands also completely avoid the VACUUM load caused by bulk DELETE.

. Rarely used data can be migrated to cheaper and slower storage media.

When a table is very large, the benefits of partitioning are well worth it. How a table benefits from partitioning depends on the application. A rule of thumb is that partitioning benefits tables when the size of the table exceeds the physical memory of the database server.

Currently, PostgreSQL supports partitioning through table inheritance. Each partition must be created as a child of a single parent table. The parent table itself is normally empty; it exists only to represent the entire database. You should be familiar with table inheritance before attempting to set up partitions.

The following forms of partitioning can be implemented in PostgreSQL:

Range partition

The table is divided into "range" partitions based on a key column or set of columns, and there is no overlap between the ranges of different partitions. For example, we can partition by date range, or by the identifier of a specific business object.

List partition

Partition the table by explicitly listing the key values that appear in each partition.

Implement partitioning

To create a partition table, do this:

1. Create a "master" table that all partitions will inherit.

This table will not contain data. Do not define any check constraints on this table unless you plan to apply them to all partitions. There is also no need to define any indexes or unique constraints.

two。 Create some "child" tables that inherit from the main table. Typically, these tables do not add any columns to the set of columns inherited from the main table.

We think of these child tables as partitions, although they look at normal PostgreSQL tables (or perhaps external tables) in every way.

3. Add table constraints to the partition table to define the key values allowed in each partition.

Typical examples are:

CHECK (x = 1) CHECK (county IN ('Oxfordshire',' Buckinghamshire', 'Warwickshire')) CHECK (outletID > = 100 AND outletID

< 200 ) 要确保这些约束能够保证在不同分区所允许的键值之间不存在重叠。设置范围约束时一种常见的错误是: CHECK ( outletID BETWEEN 100 AND 200 )CHECK ( outletID BETWEEN 200 AND 300 ) 这是错误的,因为键值200并没有被清楚地分配到某一个分区。注意在语法上范围划分和列表划分没有区别,这些术语只是为了描述方便而存在。 4.对于每一个分区,在关键列上创建一个索引,并创建其他我们所需要的索引(关键索引并不是严格必要的,但是在大部分情况下它都是有用的。如果我们希望键值是唯一的,则我们还要为每一个分区创建一个唯一或者主键约束。) 5.还可以有选择地定义一个触发器或者规则将插入到主表上的数据重定向到合适的分区上。 6.确保在postgresql.conf中constraint_exclusion配置参数没有被禁用。如果它被禁用,查询将不会被按照期望的方式优化。 例如,假设我们正在为一个大型的冰淇淋公司构建一个数据库。该公司测量每天在每一个区域的最高气温以及冰淇淋销售。在概念上,我们想要一个这样的表: CREATE TABLE measurement (city_id int not null,logdate date not null,peaktemp int,unitsales int);insert into measurement values(1,date '2008-02-01',1,1); 由于该表的主要用途是为管理层提供在线报告,我们知道大部分查询将只会访问上周、上月或者上季度的数据。为了减少需要保存的旧数据的量,我们决定只保留最近3年的数据。在每一个月的开始,我们将删除最老的一个月的数据。 在这种情况下,我们可以使用分区来帮助我们满足对于测量表的所有不同需求。按照上面所勾勒的步骤,分区可以这样来建立: 1.主表是measurement表,完全按照以上的方式声明。 jydb=# CREATE TABLE measurement (jydb(# city_id int not null,jydb(# logdate date not null,jydb(# peaktemp int,jydb(# unitsales intjydb(# );CREATE TABLE 2.下一步我们为每一个活动月创建一个分区: CREATE TABLE measurement_y2006m02 ( ) INHERITS (measurement);CREATE TABLE measurement_y2006m03 ( ) INHERITS (measurement);...CREATE TABLE measurement_y2007m11 ( ) INHERITS (measurement);CREATE TABLE measurement_y2007m12 ( ) INHERITS (measurement);CREATE TABLE measurement_y2008m01 ( ) INHERITS (measurement); 每一个分区自身都是完整的表,但是它们的定义都是从measurement表继承而来。 这解决了我们的一个问题:删除旧数据。每个月,我们所需要做的是在最旧的子表上执行一个DROP TABLE命令并为新一个月的数据创建一个新的子表。 3.我们必须提供不重叠的表约束。和前面简单地创建分区表不同,实际的表创建脚本应该是: jydb=# CREATE TABLE measurement_y2006m02 (jydb(# CHECK ( logdate >

= DATE '2006-02-01' AND logdate

< DATE '2006-03-01' ) jydb(# ) INHERITS (measurement); CREATE TABLE jydb=# CREATE TABLE measurement_y2006m03 ( jydb(# CHECK ( logdate >

= DATE '2006-03-01' AND logdate

< DATE '2006-04-01' ) jydb(# ) INHERITS (measurement); CREATE TABLE jydb=# CREATE TABLE measurement_y2008m02 ( jydb(# CHECK ( logdate >

= DATE '2008-02-01' AND logdate

< DATE '2008-03-01' )jydb(# ) INHERITS (measurement);CREATE TABLEjydb=# CREATE TABLE measurement_y2008m03jydb-# (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);CREATE TABLE 4.我们可能在关键列上也需要索引: CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate);CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate);...CREATE INDEX measurement_y2007m11_logdate ON measurement_y2007m11 (logdate);CREATE INDEX measurement_y2007m12_logdate ON measurement_y2007m12 (logdate);CREATE INDEX measurement_y2008m01_logdate ON measurement_y2008m01 (logdate); 在这里我们选择不增加更多的索引。 5.我们希望我们的应用能够使用INSERT INTO measurement ...并且数据将被重定向到合适的分区表。我们可以通过为主表附加一个合适的触发器函数来实现这一点。如果数据将只被增加到最后一个分区,我们可以使用一个非常简单的触发器函数: CREATE OR REPLACE FUNCTION measurement_insert_trigger()RETURNS TRIGGER AS $$BEGIN INSERT INTO measurement_y2008m01 VALUES (NEW.*); RETURN NULL;END;$$LANGUAGE plpgsql; 完成函数创建后,我们创建一个调用该触发器函数的触发器: CREATE TRIGGER insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger(); 我们必须在每个月重新定义触发器函数,这样它才会总是指向当前分区。而触发器的定义则不需要被更新。 我们也可能希望插入数据时服务器会自动地定位应该加入数据的分区。我们可以通过一个更复杂的触发器函数来实现之,例如: jydb=# CREATE OR REPLACE FUNCTION measurement_insert_trigger()jydb-# RETURNS TRIGGER AS $$jydb$# BEGINjydb$#jydb$# IF ( NEW.logdate >

= DATE '2006-03-01' ANDjydb$# NEW.logdate

< DATE '2006-04-01' ) THEN jydb$# INSERT INTO measurement_y2006m03 VALUES (NEW.*); jydb$# ELSIF ( NEW.logdate >

= DATE '2008-02-01' ANDjydb$# NEW.logdate

< DATE '2008-03-01' ) THENjydb$# INSERT INTO measurement_y2008m02 VALUES (NEW.*);jydb$# ELSEjydb$# RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';jydb$# END IF;jydb$# RETURN NULL;jydb$# END;jydb$# $$jydb-# LANGUAGE plpgsql;CREATE FUNCTIONjydb=# CREATE TRIGGER insert_measurement_triggerjydb-# BEFORE INSERT ON measurementjydb-# FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();CREATE TRIGGERjydb=# insert into measurement values(1,date '2006-03-03',1,1);INSERT 0 0jydb=# insert into measurement values(1,date '2008-02-03',1,1);INSERT 0 0jydb=# select * from measurement_y2006m03; city_id | logdate | peaktemp | unitsales---------+------------+----------+----------- 1 | 2006-03-02 | 1 | 1 1 | 2006-03-03 | 1 | 1(2 rows)jydb=# select * from measurement_y2008m02; city_id | logdate | peaktemp | unitsales---------+------------+----------+----------- 1 | 2008-02-01 | 1 | 1 1 | 2008-02-01 | 1 | 1 1 | 2008-02-01 | 1 | 1 1 | 2008-02-02 | 1 | 1 1 | 2008-02-03 | 1 | 1(5 rows)jydb=# select * from measurement; city_id | logdate | peaktemp | unitsales---------+------------+----------+----------- 1 | 2006-03-02 | 1 | 1 1 | 2006-03-03 | 1 | 1 1 | 2008-02-01 | 1 | 1 1 | 2008-02-01 | 1 | 1 1 | 2008-02-01 | 1 | 1 1 | 2008-02-02 | 1 | 1 1 | 2008-02-03 | 1 | 1(7 rows) 触发器的定义和以前一样。注意每一个IF测试必须准确地匹配它的分区的CHECK约束。当该函数比单月形式更加复杂时,并不需要频繁地更新它,因为可以在需要的时候提前加入分支。 注意: 在实践中,如果大部分插入都会进入最新的分区,最好先检查它。为了简洁,我们为触发器的检查采用了和本例中其他部分一致的顺序。 如我们所见,一个复杂的分区模式可能需要大量的DDL。在上面的例子中,我们需要每月创建一个新分区,所以最好能够编写一个脚本自动地生成所需的DDL。 管理分区 通常当初始定义的表倾向于动态变化时,一组分区会被创建。删除旧的分区并周期性地为新数据增加新分区是很常见的。划分的一个最重要的优点是可以通过操纵分区结构来使得这种痛苦的任务几乎是自发地完成,而不需要去物理地移除大量的数据。 移除旧数据的最简单的选项是直接删除不再需要的分区: jydb=# DROP TABLE measurement_y2006m02;DROP TABLE 这可以非常快地删除百万级别的记录,因为它不需要逐一地删除记录。 另一个经常使用的选项是将分区从被划分的表中移除,但是把它作为一个独立的表保留下来: ALTER TABLE measurement_y2006m02 NO INHERIT measurement; 这允许在数据被删除前执行更进一步的操作。例如,这是一个很有用的时机通过COPY、pg_dump或类似的工具来备份数据。这也是进行数据聚集、执行其他数据操作或运行报表的好时机。 相似地我们也可以增加新分区来处理新数据。我们可以在被划分的表中创建一个新的空分区: jydb=# CREATE TABLE measurement_y2008m02 (jydb(# CHECK ( logdate >

= DATE '2008-02-01' AND logdate

< DATE '2008-03-01' )jydb(# ) INHERITS (measurement);CREATE TABLE 作为一种选择方案,有时创建一个在分区结构之外的新表更方便,并且在以后才将它作为一个合适的分区。这使得数据可以在出现于分区表中之前被载入、检查和转换: jydb=# CREATE TABLE measurement_y2008m03jydb-# (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);CREATE TABLEjydb=# ALTER TABLE measurement_y2008m03 ADD CONSTRAINT y2008m03jydb-# CHECK ( logdate >

= DATE '2008-03-01' AND logdate

< DATE '2008-04-01' );ALTER TABLEjydb=# ALTER TABLE measurement_y2008m03 INHERIT measurement;ALTER TABLE 分区与约束排除 约束排除是一种查询优化技术,它可以为按照以上方式定义的分区表提高性能。例如: jydb=# SET constraint_exclusion = on;SETjydb=# SELECT count(*) FROM measurement WHERE logdate >

= DATE '2008-01-01-01; count- 3 (1 row)

Without constraint exclusion, the above query scans each partition of the measurement table. When constraint exclusion is enabled, the planner checks the constraints of each partition to determine whether the partition needs to be scanned, because the partition may not contain rows that satisfy the query WHERE clause. If the planner can confirm this, it will exclude the partition from the query plan.

You can use the EXPLAIN command to show the difference between plans with constraint_exclusion turned on and those without this option. A typical unoptimized plan is:

Jydb=# SET constraint_exclusion = off;SETjydb=# EXPLAIN SELECT count (*) FROM measurement WHERE logdate > = DATE '2008-01-01' QUERY PLAN---- Aggregate (cost=107.47..107.48 rows=1 width=8)- > Append (cost=0.00..102.69 rows=1913 width=0)-> Seq Scan on measurement (cost=0.00..3.31 rows=62 width=0) Filter: (logdate > = '2008-01-01'::date)-> Seq Scan on measurement_y2006m03 (cost=0.00..33.12 rows=617 width=0) Filter: (logdate > =' 2008-01-01'::date)-> Seq Scan on measurement_y2008m02 (cost=0.00..33.12 rows=617 width=0) Filter: (logdate > = '2008-01-01'::date)-> Seq Scan on measurement_y2008m03 (cost=0.00..33.12 rows=617 width=0) Filter: (logdate > =' 2008-01-01'::date) 10 rows)

Some or all of these partitions will use index scans instead of full table sequential scans, but the key point is that there is no need to scan the old partitions to answer this query at all. When we turn on constraint exclusion, we get a cheaper plan for the same query:

Jydb=# SET constraint_exclusion = on;SETjydb=# EXPLAIN SELECT count (*) FROM measurement WHERE logdate > = DATE '2008-01-01' QUERY PLAN---- Aggregate (cost=72.80..72.81 rows=1 width=8)- > Append (cost=0.00..69.56 rows=1296 width=0)-> Seq Scan on measurement (cost=0.00..3.31 rows=62 width=0) Filter: (logdate > = '2008-01-01'::date)-> Seq Scan on measurement_y2008m02 (cost=0.00..33.12 rows=617 width=0) Filter: (logdate > =' 2008-01-01'::date)-> Seq Scan on measurement_y2008m03 (cost=0.00..33.12 rows=617 width=0) Filter: (logdate > = '2008-01-01'::date) (8 rows)

Note that constraint exclusion is driven only by CHECK constraints, not by the existence of the index. Therefore, it is not necessary to define indexes on key columns. Whether or not to define an index on a given partition depends on whether we want the query to scan most or a small portion of the table frequently. In the latter case, the index will work.

The default (and recommended) setting for constraint_exclusion is actually neither on nor off, but an intermediate setting called partition, which allows the technique to be applied only to queries that will work on partitioned tables. Setting up on will allow the planner to check CHECK constraints in all queries, even if simple queries do not benefit.

Alternative zoning method

Another way to redirect inserted data to the appropriate partition is to establish rules on the main table rather than triggers, such as:

Jydb=# CREATE RULE measurement_insert_y2006m03 ASjydb-# ON INSERT TO measurement WHEREjydb-# (logdate > = DATE '2006-03-01' AND logdate

< DATE '2006-04-01' ) jydb-# DO INSTEAD jydb-# INSERT INTO measurement_y2006m03 VALUES (NEW.*); CREATE RULE jydb=# CREATE RULE measurement_insert_y2008m02 AS jydb-# ON INSERT TO measurement WHERE jydb-# ( logdate >

= DATE '2008-02-01' AND logdate < DATE' 2008-03-01') jydb-# DO INSTEADjydb-# INSERT INTO measurement_y2008m02 VALUES (NEW.*); CREATE RULEjydb=# insert into measurement values (1 Magi date '2006-03-02); INSERT 0 0jydb=# insert into measurement values (1 Magi date' 2008-02-02); INSERT 0 0jydb=# select * from measurement City_id | logdate | peaktemp | unitsales-+- 1 | 2006-03-02 | 1 | 1 | 2008-02-01 | 1 | 1 | 2008-02-01 | 1 | 1 1 | 2008-02-01 | 1 | 1 | 2008-02-02 | 1 | 1 (5 rows) jydb=# select * from measurement_y2006m03 City_id | logdate | peaktemp | unitsales-+- 1 | 2006-03-02 | 1 | 1 (1 row) jydb=# select * from measurement_y2008m02 City_id | logdate | peaktemp | unitsales-+- 1 | 2008-02-01 | 1 | 1 | 2008-02-01 | 1 | 1 | 2008-02-01 | 1 | 1 1 | 2008-02-02 | 1 | 1 (4 rows)

A rule has a significantly higher load than a trigger, but the load is borne by each query rather than every row, so this approach may be useful in the case of bulk inserts. However, in most cases, trigger methods can provide better performance.

Note that COPY ignores the rules. If we want to use COPY to insert data, we will want to copy the data to the correct partitioned table instead of the main table. COPY raises a trigger, so you can use it normally if you use the trigger method.

Another disadvantage of the rule approach is that if a set of rules does not overwrite the inserted data, the data will be inserted into the main table without issuing any errors.

Partitions can also be organized using a UNION ALL view. For example:

CREATE VIEW measurement AS SELECT * FROM measurement_y2006m02UNION ALL SELECT * FROM measurement_y2006m03...UNION ALL SELECT * FROM measurement_y2007m11UNION ALL SELECT * FROM measurement_y2007m12UNION ALL SELECT * FROM measurement_y2008m01

However, if you want to add or delete individual partitions, you need to recreate the view. In practice, this approach is rarely recommended compared to the use of inheritance.

Warning

The following warning applies to partitioned tables:

. There is no automatic way to verify that all CHECK constraints are mutually exclusive. Creating code to generate partitions and creating or modifying related objects is safer than writing commands by hand.

. The patterns shown here assume that the key column of a partition never changes, or that its change is not sufficient to cause it to be moved to another partition. A UPDATE that attempts to move a row to another partition fails because of the existence of the CHECK constraint. If we need to deal with this kind of situation, we can place the appropriate update trigger on the partition table, but it will make the management of the structure more complex.

. If we are using manual VACUUM or ANALYZE commands, don't forget to run it once on each partition. The following command:

ANALYZE measurement

Only the master table will be processed.

. An INSERT statement with an ON CONFLICT clause is unlikely to work as expected, because the ON CONFLICT action is adopted only if there is a unique violation on the specified target relationship, not its child relationship.

The following warning applies to constraint exclusion:

. Constraint exclusion takes effect only if the WHERE clause of the query contains constants (or externally supplied parameters). For example, a comparison with a non-invariant function, such as CURRENT_TIMESTAMP, cannot be optimized because the planner does not know which partition the function's value will fall into at run time.

. Keep partition constraints simple, otherwise the planner may not be able to validate partitions that do not need to be accessed. Use simple equality conditions for list partitions or simple range tests for range partitions, as shown in the previous example. A good rule of thumb is that partitioning constraints should include only comparisons using the B-tree index operator, with partitioned columns and constants on both sides.

. During constraint exclusion, all constraints on all partitions of the main table are checked, so a large number of partitions will significantly increase query planning time. Partitions that use these techniques work well with up to 100 partitions, but don't try to use thousands of partitions.

On how to use table inheritance in PostgreSQL to achieve partition table sharing here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report