Analysis of DB2 Partition Database 07/11 Update SLTechnology News&Howtos

Analysis of DB2 Partition Database

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. Concept description

DB2 database partitioning is provided by the DB2 Enterprise Edition DPF (Data Partitioning Feature) option, which is mainly used to distribute large databases over a partition (logical or physical) to provide the necessary scalability and takes advantage of a shared-nothing structure. The database is broken down into separate partitions in a non-shared environment, each with its own resources, such as memory, CPU, and disk, as well as its own data, indexes, configuration files, and transaction logs. Database partitions are sometimes called nodes or database nodes. Through DPF "divide and conquer" processing, scalability can be enhanced in a single server (scale-up) or across server clusters (scale-out).

One of the most obvious reasons for using DPF is to improve the performance of query workloads and INSERT/UPDATE/DELETE operations. DPF can also overcome some of the architectural limitations of DB2. For example, in DB2, for a page size of 4 KB, the maximum size of a table is 64 GB; for a page size of 8 KB, the maximum size of a table is 128 GB; for a page size of 16 KB, and the maximum size of a table is 256 GB; for a page size of 32 KB. The maximum size of a table is 512 GB. In DB2, the size limits for tables and tablespaces are specified on a per partition basis. Partitioning a database across multiple partitions allows you to increase the maximum size of the table based on the factor of the number of partitions in the environment.

Example diagram of DB2 database partition:

2. The influence of DPF on database performance.

The data is hashed into different partitions through the Hash algorithm, and each partition is only responsible for processing its own data. After the user sends out the SQL operation, the connected partition is called Coordinate Node, which is responsible for handling the user's request. According to the Partition key (partition key), the user's request is divided into several sub-tasks and handed over to different partitions for parallel processing. Finally, the execution results of different partitions are summarized and returned to the user. The partition is transparent to the application.

In DB2, database partitions can be deployed in a cluster or MPP (multiple single CPU machines, a DB2 instance with multiple partition and one Partition on each machine), that is, database partitions are distributed on different machines Database partitions can also be deployed on the same SMP (a DB2 instance with multiple partitions established on a machine with multiple CPU, in which the number of partitions does not exceed the number of existing CPU). Partitions on the same machine are called logical partitions. At the same time, we can deploy multiple partitions in a cluster or MPP environment, and multiple logical partitions on each node of the cluster or MPP.

There are several benefits to adopting database partitioning, which are briefly described below:

Query expansibility

This is one of the main reasons for the adoption of database partitioning. Dividing a large database into multiple smaller databases can improve query performance because each database partition has its own part of the data. Suppose 100000 records are scanned now, and for a database with a single partition, the scanning operation requires the database manager to scan 100000 records independently. If the database system is divided into 10 partitions and the 100000 records are evenly distributed to the 10 partitions, then the database manager of each database partition scans only 100000 records.

Architectural limitation

The largest table in a non-partitioned database depends on the page size, with a maximum of 64 GB,32K pages for 4K pages and a maximum of 512 GB data. Table and tablespace size limits are limited on each partition, so dividing the database into N partitions increases the maximum size of the table to N times the maximum size of a single partition table. Memory can also be a limitation, especially in a 32-bit operating system environment, because each database partition manages and has its own resources, so this limitation can be overcome through database partitions.

Database loading performance

Database partitions can load data to all database partitions in parallel, greatly reducing the load time of a single table, which is especially important for systems that require particularly high data loading time, such as real-time business intelligence systems.

Database maintenance performance

Spreading the database across multiple database partition servers can speed up system maintenance because each operation runs on a subset of data managed by the partition, which further reduces the time it takes to create indexes and collect statistics through database partitions, because runstats runs on only one database partition, reducing table reorganization (reorg) time.

Backup / restore performance

Partitioning databases to different database servers can greatly reduce the time it takes to back up databases, which is an important point in deciding whether to use database partitions. DB2 implements parallel processing of backup and restore operations by assigning separate processes or threads to each tablespace. In the backup of the partition database environment, the backup of each partition is independent, and the time of backing up the whole database can be greatly reduced by backing up the database partition in parallel.

Journal

In highly active systems, the performance of database logs may limit the overall throughput of the system. In a partitioned database environment, each partition has its own set of logs. When there are a large number of insert, update, and delete operations, multiple database partitions can improve performance because logs are written in parallel on each database partition, and fewer logs need to be recorded per partition.

DB2 can provide near-linear scalability as the amount of data or processors and partitions increase, but whether database partitioning provides the most benefits depends on the workload of processing, the size of the maximum table, and other factors. At present, the data warehouse of our project also uses database partitioning because of the large amount of data and the large demand for CPU, but the machine is old and cannot add more CPU on a single machine, and multiple partitions need to be implemented on an instance, so another SMP Cluster is adopted (on multiple machines with multiple CPU, a DB2 Instance with multiple partition is established, and multiple Partition is created on each machine)

3. Comparison between DB2 partition and Oracle partition DB2 partition Oracle 10g syntax DB2 V9 syntax interval partition (Range Partitioning) table partition (Table Partitioning) PARTITION BY RANGEPARTITION BY RANGE hash partition (Hash Partitioning) database partition (Database Partitioning) PARTITION BY HASHDISTRIBUTE BY HASH list partition (List Partitioning) with generated list partition (Table Partitioning With Generated Column) PARTITION BY LISTPARTITION BY RANGE does not support multidimensional cluster (Multidimensional clustering) without ORGANIZE BY DIMENSION

Taking database partitioning as an example, the following is a comparison between DB2's database partitioning and Oracle hash partitioning features:

DB2 partition

Oracle partition

Partition architecture

Share-nothing

Share-disk

Zoning characteristic

Each CPU has a private memory area and private disk space, and the two CPU cannot access the same disk space, and the communication between the CPU is connected through the network.

Each CPU uses its own private memory area and directly accesses all disk systems through internal communication mechanisms.

The difference between the two

Scalability-physical increase in computing resources (that is, database partitioning) as the database grows

Unable to expand database capacity by adding physical partition

Statement example

The partition_tablename table selects the partition_ id field as the partition key

The hash_tablename table is hashed according to the hash_part field, and each partition is placed in the table spaces tbsp1 and tbsp2 in a circular manner.

CREATE TABLE partition_tablename

(partition_id id NOT NULL

Partition_id VARCHAR (20) NOT NULL)

IN tbsp_parts

DISTRIBUTE BY HASH (partition_id)

CREATE TABLE hash_tablename

(hash_part id

Hash_id varchar2 (20))

PARTITION BY HASH (hash_id)

(partition p1 tablespace tbsp1

Partition p2 tablespace tbsp2)

4. Summary

The purpose of this sharing is to briefly introduce the concept of DB2 partition and compare it with Oracle partition. Partitioned databases facilitate the performance of query workloads and DML operations. If the amount of data is small, the performance improvement will not be obvious, so partitioned databases are generally used in databases with large amount of data and frequent query requirements. In fact, there is no absolute choice between using Oracle or DB2. For example, Oracle adopts a completely open strategy, which enables customers to choose the most suitable solution and fully support developers; DB2 is the most suitable for massive data, and is the most widely used in the enterprise level, with strong scalability and parallelism. Just like Guangdong's old fire soup is very delicious and nourishing, Chinese people may like to drink it, but foreigners think their borsch's kind of very sticky soup is called soup, and this kind of Chinese is called Water at best! Therefore, what you need is the best.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.