Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to query in parallel with DM

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

DM how to carry out parallel query, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Without parallel query technology, a serial query can only make use of CPU or one of the disk devices, but not the processing power of the whole computer. With the emergence of parallel query technology, a single SQL statement can make use of the processing power of multiple CPU and disk devices. Its advantage is that it can process the query task through multiple threads, so as to improve the efficiency of the query.

Dameng database provides parallel query function for database servers with multiple CPU to optimize the performance of query tasks. Only when the database server has multiple CPU, can the query operation be executed in parallel to improve the speed of the query task.

The Dameng database completes the parallel query through three steps: first, determine the number of parallel tasks; secondly, determine the number of parallel working threads; finally, execute the query. The relevant parameters of parallel query are shown in the following table:

Parameter name default value property description

MAX_PARALLEL_DEGREE 1 is dynamic, and session level is used to set the default number of parallel tasks. Value range: 1: 128. The default value of 1 indicates that there are no parallel tasks. This parameter value is valid when the PARALLEL_ policy value is 1.

Parallel_policy 0 is used statically to set the parallel policy. Values range: 0, 1, and 2. The default is 0. Where 0 indicates that parallelism is not supported, 1 indicates automatic parallel mode, and 2 indicates manual parallel mode.

PARALLEL_THRD_NUM 10 static is used to set the number of parallel worker threads. Value range: 1-1024.

When automatic parallelism (PARALLEL_POLICY=1) is turned on, the parameter MAX_PARALLEL_DEGREE takes effect and controls the number of threads most used by parallel queries. The default value of MAX_PARALLEL_DEGREE is 1, which means that it is not parallel. If you specify the HINT "PARALLEL" for the parameter, the Hint value is used.

When manual parallelism (PARALLEL_POLICY=2) is enabled, the parameter MAX_PARALLEL_DEGREE is invalid. Users need to use the HINT "PARALLEL" corresponding to this parameter in the statement to specify the degree of parallelism of the statement, otherwise it is not parallel.

1. Set the default value in the INI parameter

The INI parameter MAX_PARALLEL_DEGREE sets the maximum number of parallel tasks. Value range: 1: 128. The default value of 1 indicates that there are no parallel tasks, and this parameter is valid only if the PARALLEL_ policy value is 1.

For example, the format of setting MAX_PARALLEL_DEGREE to 3 in the INI parameter is as follows:

MAX_PARALLEL_DEGREE 3

Check the default value of max_parallel_degree first

SQL > select sf_get_para_value, LINEID SF_GET_PARA_VALUE: 1 1used time: 150.207 (ms). Execute id is 197.

The following query looks at the max_parallel_degree parameter values set in the dm.ini file

SQL > select * from v$dm_ini where para_name='MAX_PARALLEL_DEGREE' LINEID PARA_NAME PARA_VALUE MIN_VALUE MAX_VALUE MPP_CHK SESS_VALUE FILE_VALUE DESCRIPTION PARA_TYPE -1 MAX_PARALLEL_DEGREE 1 1128 N 1 1 Maximum degree of parallel query SESSIONused time: 50.228 (ms). Execute id is 198.

The following query looks at the max_parallel_degree parameter values in memory

SQL > select * from v$parameter where name='MAX_PARALLEL_DEGREE' LINEID ID NAME TYPE VALUE SYS_VALUE FILE_VALUE DESCRIPTION-- -1 274 MAX_PARALLEL_DEGREE SESSION 11 Maximum degree of parallel queryused time: 7.440 (ms). Execute id is 199.

Now execute the following command to change the max_parallel_degree parameter in both the memory and the dm.ini file to 3

SQL > call sp_set_para_value, DMSQL executed successfullyused time: 7.183 (ms). Execute id is 200.

Execute the following query to see that the max_parallel_degree parameter has been changed to 3

SQL > select sf_get_para_value, LINEID SF_GET_PARA_VALUE: 1 3used time: 5.544 (ms). Execute id is 201.

The sess_value and file_value returned from the query are both 3 to determine that the max_parallel_degree has been modified in the dm.ini file

SQL > select * from v$dm_ini where para_name='MAX_PARALLEL_DEGREE' LINEID PARA_NAME PARA_VALUE MIN_VALUE MAX_VALUE MPP_CHK SESS_VALUE FILE_VALUE DESCRIPTION PARA_TYPE -1 MAX_PARALLEL_DEGREE 3 1 128 N 3 3 Maximum degree of parallel query SESSIONused time: 6.910 (ms). Execute id is 202.

The sys_value returned from the query is 3 and the max_parallel_degree in memory has also been modified.

SQL > select * from v$parameter where name='MAX_PARALLEL_DEGREE' LINEID ID NAME TYPE VALUE SYS_VALUE FILE_VALUE DESCRIPTION-- -1 274 MAX_PARALLEL_DEGREE SESSION 33 3 Maximum degree of parallel queryused time: 6.335 (ms). Execute id is 203.

Then, the parallel query can be executed using a normal SQL statement query, without the need for HINT. Such as:

SQL > explain SELECT * FROM SYSOBJECTS;1 # NSET2: [0,396] 2 # LOCAL COLLECT: [0,396]; op_id (1) n_grp_by (0) n_cols (0) n_keys (0) for_sync (FALSE) 3 # PRJT2: [0,1694]; exp_num (17), is_atom (FALSE) 4 # CSCN2: [0,1694]; SYSINDEXSYSOBJECTS (SYSOBJECTS as SYSOBJECTS) used time: 0.951 (ms). Execute id is 0.

LOCAL COLLECT in the execution plan represents: data collection and processing in local parallelism instead of LOCAL GATHER.

2. Use the "PARALLEL" keyword in the SQL statement to specify

When PARALLEL_POLICY=2, you need to specify the degree of parallelism through the "PARALLEL" HINT in the SQL statement, otherwise there is no parallelism. If PARALLEL_POLICY=1, the "PARALLEL" HINT used in the SQL statement always takes precedence over the MAX_PARALLEL_DEGREE parameter setting. The use of the "PARALLEL" keyword is achieved by adding a HINT clause after the SELECT keyword of the data query statement.

The format of the HINT syntax is as follows:

/ * + PARALLEL ([

< 表名>

]

< 并行任务个数>

) * /

For example, in the following example, even though the MAX_PARALLEL_DEGREE default value of 3 has been set, the actual number of tasks specified for PARALLEL is 4:

SQL > explain SELECT / * + PARALLEL (4) * / * FROM SYSOBJECTS;1 # NSET2: [0, 1694] 2 # LOCAL COLLECT: [0, 1694]; op_id (1) n_grp_by (0) n_cols (0) n_keys (0) for_sync (FALSE) 3 # PRJT2: [0, 1694]; exp_num (17), is_atom (FALSE) 4 # CSCN2: [0, 1694] SYSINDEXSYSOBJECTS (SYSOBJECTS as SYSOBJECTS) used time: 0.967 (ms). Execute id is 0.

In addition, the number of parallel tasks can only be set once in each statement, and if it is set multiple times, the last setting will prevail, and the number of tasks will take effect in the whole statement.

For example, in the following example, the number of parallel tasks used is 2.

SQL > call sp_set_para_value, DMSQL executed successfullyused time: 6.554 (ms). Execute id is 211.SQL > select sf_get_para_value, LINEID SF_GET_PARA_VALUE: 1 1used time: 5.569 (ms). Execute id is 212.SQL > explain SELECT / * + PARALLEL (1) * / * + PARALLEL (2) * / * FROM SYSOBJECTS;1 # NSET2: [0,396] 2 # LOCAL COLLECT: [0,1694]; op_id (1) n_grp_by (0) n_cols (0) n_keys (0) for_sync (FALSE) 3 # PRJT2: [0,1694] Exp_num (17), is_atom (FALSE) 4 # CSCN2: [0,396]; SYSINDEXSYSOBJECTS (SYSOBJECTS as SYSOBJECTS) used time: 1.067 (ms). Execute id is 0.

This method can set the additional number of parallel tasks for a single query statement, so as to improve the performance of some special query tasks.

Before performing a parallel query task, you need to specify the number of parallel worker threads that complete the task. It is worth noting that the number of threads actually used is not always equal to the number of parallel worker threads. The number of parallel worker threads is set in the INI parameter, and the actual number of parallel worker threads is determined according to the actual situation of the system.

1. Number of parallel worker threads, which is set in the INI parameter

First, use the PARALLEL_POLICY parameter to set the parallel policy. Value range: 0, 1, and 2. The default value is 0. Where 0 indicates that parallelism is not supported, 1 indicates automatic parallel mode, and 2 indicates manual parallel mode.

When local parallelism (PARALLEL_POLICY > 0) is turned on, use PARALLEL_THRD_NUM to specify the number of threads used by the local parallel query, with a range of 1: 1024 and a default value of 10. It is important to note that if PARALLEL_POLICY=1, if PARALLEL_THRD_NUM=1, parallel threads are created according to the number of CPU.

For example, set the parallel policy PARALLEL_POLICY to 2, that is, manually set the number of parallel worker threads, and set the number of parallel worker threads PARALLEL_THRD_NUM to 4.

SQL > call sp_set_para_value, DMSQL executed successfullyused time: 6.942 (ms). Execute id is 223.SQL > call sp_set_para_value, DMSQL executed successfullyused time: 6.871 (ms). Execute id is 224.

Of course, not all queries are suitable for parallel queries. Queries that take up a lot of CPU cycles are most suitable for parallel queries. For example, join queries for large tables, aggregation of large amounts of data, and sorting of large result sets are all suitable for parallel queries. For simple queries, which are often used in transactional applications, the additional coordination work required to execute parallel queries outweighs the potential performance improvement. Therefore, database administrators need to be cautious when determining whether a parallel strategy is needed.

2. The number of threads actually used will be automatically detected by Dameng database according to each parallel query operation.

The actual number of threads used is determined when the database is initialized when the query plan is executed. In other words, this does not require user intervention, but is determined by the system based on the number of parallel tasks and the actual number of idle parallel worker threads. This operation is based on the following conditions: first, detect whether the Dameng database is running on a computer with multiple CPU. Only computers with multiple CPU can use parallel queries. This is a rigid constraint. Second, check whether there are enough free worker threads available. The number of threads used in parallel query is not only related to the complexity of the operation, but also related to the state of the server at that time, such as whether there are enough free worker threads available. Each parallel query operation requires a certain number of worker threads to execute; and executing parallel plans requires more threads than serial plans, and the number of threads required increases with the number of tasks. When the thread requirements for the execution of a specific parallel query cannot be met, the database engine will automatically reduce the number of tasks, or even abandon parallel queries and change to serial planning. So, even if the same operation may take different threads at different times.

For example, even if the number of parallel worker threads is set to 4. The number of threads actually used may be only three, or less.

When using manual parallel mode, you only need to set the following two parameters in the INI parameters, and then when executing parallel SQL query statements, you need to manually specify the number of current parallel tasks. If not specified, parallelism will not be used. The two parameters are set as follows:

PARALLEL_POLICY 2

PARALLEL_THRD_NUM 4

When using automatic parallel mode, the following three parameters are generally specified:

MAX_PARALLEL_DEGREE 3

PARALLEL_POLICY 1

PARALLEL_THRD_NUM 10

In addition, when PARALLEL_POLICY is 0, parallelism is not supported even if there are parallel tasks.

Then, execute a parallel SQL statement with a syntax format similar to "SELECT * FROM SYSOBJECTS;", which uses the default number of parallel tasks 3.

Of course, if a single query statement does not want to use the default number of parallel tasks, you can specify it specifically by adding HINT to the SQL statement and using the "PARALLEL" keyword. At this point, the parallel SQL statement executed is in the format "SELECT / * + PARALLEL (SYSOBJECTS 4) * / * FROM SYSOBJECTS;", and the number of parallel tasks used in this statement is 4.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report