Example Analysis of using an efficient batch processing Scheme for perfect parallelism of EHPC 07/06 Update SLTechnology News&Howtos

Example Analysis of using an efficient batch processing Scheme for perfect parallelism of EHPC

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Using the example analysis of an efficient batch processing scheme that implements EHPC perfect parallelism, I believe that many inexperienced people are at a loss about this. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Using EHPC to realize "perfect parallelism" efficient batch processing scheme

In a high-performance computing scenario, a user's business computing can be divided into a large number of tasks, each of which has the same processing logic, but different input files, parameter settings and output files. Because each task has similar processing logic and no dependence on each other during execution, it can be classified as "embarrassing parallel" according to the parallel computing model of high performance computing (also known as perfect parallel problem). This kind of problem has little or no need to divide the problem into many parallel tasks, and these parallel tasks have little or no dependence or need to communicate with each other. this kind of problem has another name, called "batch processing". Is one of the most "perfect" scenarios in the field of high-performance computing. Here, this paper presents an array job solution based on Ali Cloud elastic high performance computing scenario-- using E-HPC integrated job scheduling system, users' batch tasks are automatically assigned to array jobs to achieve high concurrency execution on cloud supercomputing clusters. At the same time, rely on the "cloud" flexibility, dynamically expand the computing resources of the cluster, and control the completion time of batch processing.

Background introduction

First introduce the scenario of batch processing through an example, and then discuss high-performance computing clusters and array jobs.

Batch processing

In the field of high-performance computing, there are large quantities of computing scenarios that can be processed at the same time, such as the following freebayes application scenarios, where different tasks use freebayes applications, but each task handles different input files (--bam-list), different parameters (- r) and different result files (--vcf). Due to the huge amount of work, the concurrent execution of tasks is needed to shorten the task processing time.

Introduction to High performance Computing Cluster and Array Job

High-performance computing cluster connects a large number of computing nodes through the network to provide a unified management and scheduling environment for large-scale applications, including account management, scheduling management, file system, cluster monitoring and other modules.

Because the cluster contains a large number of compute nodes, which are usually used by multiple users, each user can submit multiple jobs, and each job requires one or more compute nodes. The allocation of cluster resources is coordinated by scheduling management to avoid resource usage conflicts. Commonly used scheduling management software includes PBS,Slurm,SGE,LSF and so on.

An array job is a collection of jobs, which can execute a command to submit jobs and submit all jobs in the job set, each of which is distinguished by its own index value.

If you use PBS Scheduler to submit an array job, the file name is qjob.sh, and the contents are as follows:

#! / bin/bash#PBS-N arrjob # Job name # PBS-l nodes=1:ppn=1 # each job requires 1 compute node, and each node requires 1 core resource # PBS-J 1-3 # array job number 1 PBS-J 3 echo $PBS_ARRAY_ID # the number of each job is in the PBS_ARRAY_ID environment variable

The qjob.sh script defines an array job that contains three jobs. The job number range is specified by-J, with a value of 1-3. When a specific job is executed, the number of each job is obtained through the environment variable $PBS_ARRAY_ID. Qjob.sh jobs can be submitted with the following command:

Qsub. / qjob.sh

At this point, three jobs are created, and whether the job can be executed immediately needs to be determined by the scheduler according to the free resources of the cluster and the resource requirements of the job. If there are plenty of resources, three jobs can be run at the same time.

Using array jobs to solve batch tasks

From the introduction of batch processing and array jobs, array jobs are suitable for the scenario of batch computing, but there are still the following problems when it is easy to use:

What is the corresponding relationship between batch tasks and jobs? When there are a large number of tasks, is a task a job, or does a job contain multiple tasks?

How to correlate from $PBS_ARRAY_ID to different tasks? And can easily correspond to different parameters of different tasks?

How do I track the execution of a task? How can I easily view the task log? After the failure of individual tasks, how can you quickly filter and re-execute after adjustment?

To this end, we give a solution to batch processing using array jobs, including batch task to job assignment, batch task definition and task running and tracking functions.

Batch task to job assignment

When the number of batch tasks is large, if each task is assigned a job, the load of the scheduler will increase, although the scheduler can display the running status of different jobs, the large number of jobs will also lead to viewing inconvenience. In addition, adjacent tasks are performed on one node, and if the same file is used, the node's local cache can be reused.

For this reason, if the number of tasks is Nt, the number of jobs is Nj, and the number of tasks processed by each job is Nt/Nj, if it is not divisible, the job whose job number is less than Nt%Nj will process one more task. Such as the batch task above, if Nt/Nj=2 is not divisible, the job with a small job number will handle 3 tasks, while the job with a large number will handle 2 tasks.

Batch task definition

From the batch task example, we can see that some parameters of each task are different. If these changes are replaced by variables, the processing script of the batch task is (stored in the file task.sh):

$cat task.shrunk.shrunkandbinxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Exit $ret # Task execution status. 0: successful; non-0: failed.

Among them, $bamlist is used to represent the change value of-- bam-list option and-- vcf parameter, and $chrvar is used to represent the change value of-r selection.

Store the values of the specific changes in a file with the same name as the variable. Each line represents a different value. In the example, there are two variables, so you need two files-- bamlist and chrvar.

$cat bamlistbam1_100bam101_200bam201_300bam301_400bam401_500bam501_600bam601_700bam701_800bam801_900... ... bam901_1000bam1001_1100 $cat chrvarchr01:1-1000chr01:1001-2000chr03:100-200chr12:1000-2000chr02:100-1100chr03:1000-2000chr05:1000-2000chr08:1000-2000chr08:3000-6000. ... chr01:8000-9000chr06:1000-2000 task running and tracking

After the batch task is defined, you need to implement task-to-job mapping, variable file parsing and assignment. For these general functions, E-HPC provides ehpcarrayjob.py python scripts for processing. If the script name of the array job is qjob.sh, its content is as follows:

$cat qjob.shangxinxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Python ehpcarrayjob.py-n Nj-e. / task.sh bamlist chrvar

It is submitted to the cluster through the qsub command, and the PBS is scheduled for batch execution (where Nj is the number of jobs, which can be replaced according to the requirements).

$python ehpcarrayjob.py-husage: ehpcarrayjob.py [- h]-n NJOBS-e EXECFILE argfiles [argfiles..] positional arguments: argfilesoptional arguments:-h,-- help show this help message and exit-n NJOBS,-- njobs NJOBS number of jobs-e EXECFILE,-- execfile EXECFILE job command file

Where:

-n indicates how many jobs there are.

-e indicates the processing script for each task (with path required)

Argfiles one or more, specifying multiple parameter files.

After the job is submitted, the array job assigns a job id, such as "1 [] .manager", and each child job has its own child job number, such as slave 1-Nj.

Ehpcarrayjob.py generates a directory named "job id" (such as 1 [] .manager), where each subjob has a log file named "log. Subjob number" to record the execution of each job.

When the return status of the task is non-0 (failure), the value of the task variable is recorded in the "job id" directory to a file named "fails. Variable name. sub-job number". To determine the cause of the failure, modify the processing script, it is convenient to resubmit the job.

Summary

From the user's point of view, every time a numerical calculation task comes, in addition to dividing the batch of tasks, even if there are legacy scripts, you also need to rewrite the processing script for each task.

In addition, you have to face the following problems with running scenarios:

How many resources are required for this calculation?

Where can I find these resources?

Can the task be run? how can I find the cause if something goes wrong?

Will the task be recalculated or omitted?

Can the use of the machine be connected, and will it be idle for a long time?

The batch processing solution using Ali Cloud Elastic High performance Computing (E-HPC) can solve the above problems and make the work more focused.

As you can see, with the E-HPC solution, users only need to go through the following steps:

The values of the changes in the batch task are extracted and stored in a separate file, the file name conforms to the shell specification, such as bamlist, chrvar.

Script the task processing and replace the changed values in the task, such as task.sh, with a variable name (file name with the same name).

Write an array job script that indicates the resource requirements for each job, the total number of jobs, and call ehpcarrayjob.py to start batch task execution, such as qjob.sh.

Use qsub to submit the job, and go to the "Job id" to view the progress of the task and the task list of the problems. The running status of the job is judged according to the resource status of the cluster. If there are enough cluster nodes, all jobs can be run; if the resources are not satisfied, a small number of jobs can be executed first.

At the same time, E-HPC "cloud" supercomputing solution has the following advantages:

With the original characteristics of HPC cluster, it is convenient for users to log in to the cluster to compile and debug the processing logic of a single task, and monitor, analyze and optimize the application running behavior through the built-in application-level monitoring module of E-HPC.

With E-HPC, the configured environment can be directly extended to the newly added compute nodes. At the same time, low-configuration login and control nodes are used to retain the configured environment for a long time.

According to the current task processing efficiency, dynamically change the computing instance type on the "cloud", and expand the computing resources to adjust the processing time of the task to cope with the urgent task processing.

After reading the above, have you mastered the method of analyzing examples of efficient batch schemes that implement EHPC perfect parallelism? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.