Overview and implementation of Microsoft HPC solution 07/19 Update SLTechnology News&Howtos

Overview and implementation of Microsoft HPC solution

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

HPC solution is a gravel in the long history of Microsoft products, which is rarely mentioned in China, so Lao Wang is going to wipe this gravel and show it to everyone.

Before we begin, we might as well take a look at the concept of HPC. We spent nearly 50 articles on Microsoft high availability clustering. In addition to high availability clustering, Microsoft actually has load balancing clustering and high performance clustering technology. Load balancing technology is nothing more than ARR,NLB,DNS polling, which is commonly used by everyone. Microsoft's high performance clustering technology is rarely known even by people who specialize in Microsoft itpro. Therefore, in this article, from an entry point of view, we strive to make Microsoft ITpro that does not understand HPC can understand, or people who understand HPC but do not understand Microsoft products.

First of all, let's take a look at the differences between high-performance clusters and other clusters.

High availability cluster: cluster all nodes to maintain the continuous operation of one application, and automatically fail over to other nodes if the current application node fails

Load balancing cluster: all nodes in the cluster balance the access requests of an application, taking advantage of the response of each node to improve application performance while supporting failover

Distributed clustering: all nodes in the cluster work together to accomplish one thing, and one thing can be divided into several small things to be handled by different nodes, and the results are finally summarized.

High-performance cluster: all nodes of the cluster work together to accomplish one thing, usually based on computing, combining the performance of the cluster to achieve the goal of fast computing.

Further discussion on distributed Cluster and High performance Cluster

The two cluster models are similar to some extent, but not quite the same to some extent.

For example, the node of a distributed cluster can be server,pc,hpc, it can be cross-computer room, cross-region, multinational, or windows,linux,unix, and distributed computing can include multiple clusters and multiple nodes. Basically, one of the biggest characteristics of distributed clusters is that there are no more requirements for nodes that provide computing. As long as they can provide computing power, they are loosely coupled and will not be associated with a certain OS. Some kind of hardware binding

High-performance clusters are different from distributed clusters in this respect. High-performance clusters emphasize high performance. Therefore, if the nodes in this cluster come from different countries, they will certainly not be able to achieve the goal of high performance. Therefore, the nodes of a high-performance cluster must be quickly connected to a group of servers through a high-speed network and cannot be scattered in different countries. At the same time, high-performance clusters are generally not composed of PC, ordinary server, at least by relatively high-performance server, or server specially designed for high-performance computing. The nodes of a high-performance cluster are usually standardized or customized hardware.

Distributed computing usually divides a job into multiple small jobs and then hands them over to each node. High-performance computing is usually parallel jobs.

Distributed computing is more suitable for finding patterns in computing, analytical computing, and additive computing.

High-performance computing clusters are more likely to be used in mathematics, industry, scientific research, etc., and need to deal with multi-dimensional computing in a short time.

Is high-performance computing cloud computing?

In terms of operational form, Lao Wang divides cloud computing into two perspectives.

For end users, individual users, or business departments that use cloud computing, cloud computing is a new IT consumption model. It is self-service, automated, can be used as long as you pay for it, deploys quickly, charges flexibly, and does not cost other upfront fees and time.

For cloud providers or departments that provide cloud computing, cloud computing is a new IT management model, which centralizes the original decentralized architecture. Through software definition and resource pool, most of the storage and network work can be performed. At the same time, after using the cloud, you can do some automated maintenance operations to reduce IT maintenance time. In addition, cloud computing introduces cost management and SLA. Help IT department to further reflect its own value

Therefore, we can see that in terms of operation form, cloud computing is more like a choice of enterprise informatization. Enterprises choose cloud computing to assist existing data centers and improve the efficiency of resource application.

There is a great conflict between high-performance computing and cloud computing. Cloud computing mainly requires flexibility, rapid deployment, rapid response to user requests, and even deployment based on the performance of the program, so the background will introduce virtualization, multi-tenancy and other technologies. High-performance computing is usually not virtualized, but will be undertaken by physical machines to achieve the best performance.

In recent years, some companies have proposed to integrate high-performance computing and cloud computing, mainly the concept put forward by public cloud vendors, to connect local high-performance computing clusters from the iaas or paas layer and use the public cloud as part of high-performance computing, but some high-performance computing users will still be skeptical about this, the main computing will still run locally, what scenarios are suitable for public clouds? It may be that when you want to do some simulation calculation, the queue of the local high-performance cluster is full. At this time, you can submit the job to the public cloud temporarily. It is best not to generate data, but only depends on the result. This kind of calculation can be applied to the public cloud.

High performance Cluster terms and Common functions

High performance computing (Highperformance computing, abbreviation HPC) first we mentioned that high performance computing usually refers to a high performance host, and jobs are submitted to this host for computing through slicing and other technology. with the continuous development of information technology, high performance computing is also gradually facing enterprises. Today's high performance computing usually refers to the computing systems and environments of several computers (operating as a single computing resource) organized in a cluster. It is more likely to achieve high-performance computing through high-performance cluster software + servers.

Architecture selection of High performance Computing Server

SMP (Symmetrical Multi-Processing) means that a group of processors (multiple CPU) are gathered on a computer, and the memory subsystem and bus structure are shared among the CPU. The main feature of the SMP server is sharing, and all the resources in the system (CPU, memory, Icando, etc.) are shared. It is precisely because of this feature that leads to the main problem of the SMP server, that is, its scalability is very limited. For SMP servers, each shared link may cause a bottleneck in the expansion of SMP servers, and the most limited one is memory. Since each CPU must access the same memory resources through the same memory bus, memory access conflicts will increase rapidly with the increase of the number of CPU, which will eventually result in a waste of CPU resources and greatly reduce the effectiveness of CPU performance.

NUMA (Non-Uniform Memory Access), due to the limitation of the scalability of SMP, people begin to explore the technology of how to effectively expand to build large-scale systems. NUMA is one of the results of this effort. Using NUMA technology, dozens of CPU (or even hundreds of CPU) can be combined into one server. The basic feature of NUMA server is that it has multiple CPU modules, and each CPU module is composed of multiple CPU (such as 4). And it has independent local memory, Imax O notch and so on. Because its nodes can connect and exchange information through interconnection modules (such as Crossbar Switch), each CPU can access the memory of the entire system (this is an important difference between NUMA systems and MPP systems). Obviously, the speed of accessing local memory will be much faster than that of accessing remote memory (the memory of other nodes in the system), but NUMA technology also has some defects, because the delay of accessing remote memory is much longer than that of local memory, so when the number of CPU increases, the system performance can not increase linearly. For example, when HP released the Superdome server, it published its relative performance value with other HP UNIX servers. It was found that the relative performance value of the 64-way CPU Superdome (NUMA structure) was 20, while that of the 8-way N4000 (shared SMP structure) was 6.3. As can be seen from this result, 8 times the number of CPU results in only a 3-fold improvement in performance.

MPP (Massive Parallel Processing) is connected by multiple SMP servers through a certain node interconnection network, works together and accomplishes the same task. From the user's point of view, it is a server system. Its basic feature is that multiple SMP servers (each SMP server is called node) are connected through the node interconnection network, and each node only accesses its own local resources (memory, storage, etc.), so it is a completely Share Nothing structure, so its expansion ability is the best. In theory, its expansion is unlimited. The current technology can achieve 512 nodes interconnection and thousands of CPU. At present, there are no standards for node Internet in the industry, such as NCR's Bynet,IBM 's SPSwitch, and they all adopt different internal implementation mechanisms. However, the node Internet is only for internal use of the MPP server and is transparent to users.

In MPP system, each SMP node can also run its own operating system, database and so on. But unlike NUMA, it has no remote memory access problem. In other words, CPU within each node cannot access the memory of the other node. The information exchange between nodes is realized through the network of nodes, which is generally called data redistribution (Data Redistribution).

However, the MPP server needs a complex mechanism to schedule and balance the load of each node and parallel processing. At present, some servers based on MPP technology often shield this complexity through system-level software (such as database). For example, NCR's Teradata is a relational database software based on MPP technology. When developing an application based on this database, no matter how many nodes the background server consists of, the developer is faced with the same database system without considering how to dispatch the load of some of these nodes.

In general, SMP and NUMA are the choices for people to achieve high-performance computing in the past, and a key standard for realizing high-performance computing is to allow job programs to execute in parallel on multiple nodes in order to get the fastest computing results. Now people are more likely to choose clusters composed of MPP architecture or ordinary servers to achieve high-performance computing. For parallelism, it extends from the original single hardware level to the software level. Through the combination of parallel programs and high-performance cluster systems, high-performance computing can be achieved.

Common metrics of high performance computing cluster

GFLOPS is Giga Floating-point Operations Per Second, that is, floating-point operations 1 billion times per second.

A MFLOPS (megaFLOPS) equals 1 million (= 10 ^ 6) floating-point operations per second

A floating-point operation with a GFLOPS (gigaFLOPS) equal to 1 billion (= 10 ^ 9) per second

A TFLOPS (teraFLOPS) equals 1 trillion (= 10 ^ 12) floating-point operations per second

A PFLOPS (petaFLOPS) equals 1 trillion (= 10 ^ 15) floating-point operations per second

Linpack theoretical peak calculation = CPU main frequency X CPU number of floating-point calculations performed per clock cycle X number of CPU cores in the system

Whether or not to support GPU high-performance computing, in floating-point computing, parallel computing and other aspects, GPU can provide tens of times or even hundreds of times the performance of CPU.

Common roles in high-performance computing

Management node: responsible for the monitoring, resource grouping, resource scheduling, job scheduling, authorization control, and system push of the entire high-performance cluster (usually PXE deployment components such as DHCP,WDS are installed on the head node, and when large-scale deployment is carried out, pxe guides the deployment of other computing nodes directly, or combines BMC to achieve wake-up deployment)

Compute nodes: used to run jobs, this type of node usually cannot be a different type of node (that is, changing roles)

User access node: as the external entrance to the high-performance cluster, users log in to this node to submit jobs, usually with Web,console,api, script and other interfaces

Storage node: stores data generated by a high-performance cluster, or data needed by a high-performance cluster, and usually maps the NFS,SMB of this node to other computing nodes

Database nodes: high-performance computing clusters usually have a database system for storing data such as cluster job scheduling, reports, diagnostics, etc., usually directly using the header node, or deploying remote database nodes

The most important thing in the whole architecture is the management node and the database node. In the actual environment, high performance and high availability are usually combined to achieve high availability for the management node and database node.

If the computing node is broken, usually the management node will mark it as a failure, and the next time the computing task is distributed, the node will be excluded. Ideally, the computing node should be stateless, and the data needed for calculation should come from the storage node.

If the user accesses the node, if the portal access is realized, multiple nodes can be built to achieve high availability combined with load balancing cluster.

Common terms for high performance computing

Job: the high-performance cluster will eventually provide users with portals, consoles, or interfaces for users to provide requests to the high-performance computing cluster. A job is the computing task that you want the high-performance cluster to perform. Usually, a job can contain multiple tasks, and you can design the priority and correlation between different tasks in the job.

Queue: successfully submitted jobs will be submitted to the HPC cluster head node, and then the head node will schedule to the appropriate resource packet to execute the job according to the scheduler rules. If all the resources of the current high-performance cluster are in use, or if there is a higher priority calculation in progress, the newly submitted jobs will be queued until computing resources are released.

Resource grouping: through the management node, you can group resources into groups for all computing nodes in the current cluster, convert resources from servers to computing power, and deliver them to users according to the computing power of the groups. when using it, users can choose which resource grouping of computing power to put the job into. If the resource group is busy, the user job will also enter the queue.

What is MPI?

MPI is not a new development language, it is a library that defines functions that can be called by C, C++ and Fortran programs. These function libraries mainly involve the functions of communication between two processes. Using the distributed memory model, MPI is a message passing programming model. Processes residing on different nodes can communicate with each other through the network to realize the exchange of information between processes.

To put it simply, MPI is a programming model that can be used to transmit messages between different node processes. Through MPI, we can achieve process-level cross-node communication, and then achieve parallel computing.

The execution of a MPI application is as follows

MPI initialization

The message is distributed by the management node to the computing node for parallel execution.

After the completion of the execution of the computing node, the result is returned to the application through communication.

In addition to MPI, there are other methods to implement parallel computing. For example, OpenMP,OpenMP has achieved the effect of parallel computing by sharing variables among nodes, using threads as parallel computing granularity and using shared memory model.

When we submit jobs to a high-performance cluster, although the cluster has the ability to implement high-performance computing, we still cannot achieve high-performance computing if the jobs we submit are not parallel. Therefore, we also need to consider how to make jobs execute in parallel.

Through the above introduction, Lao Wang briefly introduced some terms of high-performance computing, as well as the form of actual operation. I did not blindly introduce the concept of high-performance computing, but through comparison and examples to make it easier for everyone to understand. Build the model in your head.

Next, we will focus on Microsoft's solution for HPC. Microsoft entered the field of HPC in about 2003, when Microsoft first launched the windows computer cluster server solution and entered the field of high-performance computing. The whole solution had two parts, Windows Server 2003 Compute Cluster Edition and Microsoft Compute Cluster Pack, a system version and a component package, in fact, core high-performance computing functions such as resource partitioning and job scheduling. All in cluster pack, Computer Cluster edition is only an optimized version of the operating system, and it was the same in the 2008 era, when the 2008 era was renamed HPC server and hpc pack. Since 2012, Microsoft has no longer released the hpc server system version, hpc pack can run directly in the standard version and data center version of OS, and finally released the computer cluster version because the OS was not well aware of the hardware at that time, such as RDMA,ODX and other functions, which have been perfectly embedded in the Server version since 2012.

The latest version of hpc pack is hpc pack 2016. Hpc pack can help us configure the roles required by the hpc solution and provide tools for managing hpc clusters.

The roles of Microsoft's HPC solution are as follows, taking the latest hpc 2016 as an example

Head node: responsible for the monitoring of the entire high-performance cluster, resource grouping, resource scheduling, job scheduling, system push (hpc pack will automatically install dhcp,wds for the head node)

Compute nodes: used to run jobs, this type of node usually cannot be a different type of node (that is, changing roles)

Proxy node: the Windows Communication Foundation (WCF) proxy node is used to route WCF calls from service-oriented architecture (SOA) clients to SOA services running on nodes in the cluster, a type of node that can change roles to compute nodes without redeployment

Workstation nodes and unmanaged server nodes: workstation nodes and unmanaged server nodes are computers in an organization that can also run jobs, but they are not dedicated cluster resources. They can arrange to run the work at a specific time, or they can provide it on demand. This type of node cannot change the role

Microsoft Azure Paas nodes: if you have Microsoft Azure subscriptions, you can add Azure nodes as needed to increase cluster capacity as needed. Like compute nodes, workstation nodes, and unmanaged server nodes, Azure nodes can run jobs. When you add an Azure node, you can also configure a fixed or variable number of proxy nodes in the Azure deployment to enable communication between the local header node and the Azure node.

Microsoft Azure IaaS nodes: if you have Microsoft Azure subscriptions, you can add Microsoft Azure IaaS nodes as needed to increase cluster capacity as needed

Key feature updates for hpc pack 2012-2016

1.HPC Pack 2016 supports workgroup deployment of compute nodes

2.HPC Pack 2012 R2 Update 2 supports Azure linux compute node deployment

3.HPC Pack 2012 R2 Update 3 supports local linux compute node deployment

4.HPC Pack 2012 R2 Update 3 supports GPU of Windows compute nodes

The management and scheduling experience of the Linux node in 5.HPC Pack 2012 R2 Update 3 is similar to the Windows node that we have already supported. You can see the heat map of the Linux node, create jobs using cmdlet specific to the Linux node, monitor jobs, tasks and node status, etc., and the Linux node begins to support the convenient tool clusrun

Hpc pack head node deployment requirements

Operating system: Windows Server 2016. Windows Server 2012 R2, the header node must join the domain

SQL Server: by default, hpc pack installs a sql express locally on the header node and automatically configures the database. It can also use remote database instances to support database clustering or alwayson.

.net framework: version 4.6.1 or higher is required, Windows Server 2016 comes with it, Windows Server 2012R2 needs to be installed separately, there is an installation package in the DotNetFramework directory of the HPC pack installation file, and KB2919442,KB2919355 patches need to be installed sequentially before 2012R2 installation.

Other prerequisite components will be installed automatically by hpc pack

Hpc pack database list

HPCManagement: cluster management information

HPCScheduler: job scheduling

HPCReporting: cluster report

HPCDiagnostics: cluster diagnosis

HPCMonitoring: cluster monitoring data

For database optimization, please refer to the link for hpc pack 2012 2012R2 2016

Https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/hpc-server-2012-R2-and-2012/hh507109(v=ws.11)

The network roles of Microsoft HPC cluster are as follows

Enterprise network an enterprise organization network that connects to an internal infrastructure server, such as AD, and requires access to a user network through which users submit jobs to the head node and, in some cases, connect to other nodes in the cluster, unless private and application networks also connect to the cluster node Otherwise, all intra-cluster management and deployment traffic will be used in the enterprise network to carry out private networks that carry inter-cluster communications between nodes. If there is no application network, the private network includes management, deployment, and application communication application networks

Private networks, preferably with high bandwidth and low latency. This network is typically used only for parallel messaging interface (MPI) application communication between cluster nodes.

For detailed HPC network topology planning, please refer to https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/hpc-server-2012-R2-and-2012/ff919486(v=ws.11)

Introduction to the implementation environment

08dc: 10.0.0.2 255.0.0.0

Hpc-headnode

Enterprise Network 10.0.0.9 255.0.0.0

Private network 18.0.0.1 255.0.0.0

Hpc pack 2016 download address https://www.microsoft.com/en-us/download/details.aspx?id=56360

After the download is complete, open it in the header node as follows, click New installation or add new features to the existing installation.

Select the installation type, we choose to create the header node for the first installation, and create a new HPC cluster

If there is a high availability deployment for the head node, you can choose the high availability here. We only have one here, so choose a single head.

Evaluate installation rules

It should be noted that do not be smart to install WDS in advance, or an error will be reported here.

Select the database deployment model. By default, hpc pack 2016 installs sql server 2016 express, or at least SQL 2008R2 Standard Edition or Enterprise Edition if deployed remotely.

Remote deployment SQL server reference https://docs.microsoft.com/en-us/previous-versions/orphan-topics/ws.10/ee783551(v=ws.10)

Certificate here, you can choose to create a new self-signed certificate from the import settings

Or directly apply to AD for a computer certificate that allows the key to be exported, and click to browse hpc pack to identify this certificate. This certificate is used for communication security among HPC pack nodes, so you need to export a PFX with key, and import this certificate when you add computing nodes later.

Finally, hpc pack will install the necessary components for the head node according to its own offline installation package. At this time, you can make a cup of green tea and wait for all options to turn green.

When the prerequisite components are installed, you are prompted to enter the HPC cluster connection string, which is the difference between the HPC cluster and the highly available cluster. The highly available cluster is given a logical name, while the external connection of the HPC cluster is always the header node.

Click finish to open HPC Pack 2016 Cluster Administrator automatically, and you need to complete the deployment task before starting administration.

Configure the network topology, where Lao Wang selects all the nodes on the enterprise and private network, considering that there may be domain nodes in the nodes, so Lao Wang designs two networks for each node, enterprise network, and private HPC network, but finally publishes only the management node enterprise network address.

The configuration is completed as follows. If it is detected that the network card supports RDMA Direct, it will be shown here. It is recommended to enable RDMA for the HPC network. At the same time, the VPC configuration section will create a DHCP scope in the management node to be used when deploying the computer node. The scope is the HPC network segment.

Set up the installation node account and have local administrator privileges

Configure new node naming rules, which are mainly applicable to the compute node names when managing the batch deployment of nodes through HPC

After the completion of the three prerequisite tasks, you can see the complete content of the HPC cluster, much like the UI of SystemCenter 2007 / 2012.

Next, there are some optional configurations.

To add administrator rights to a HPC cluster, it is usually recommended to create a security group in AD. Users of this security group can have permissions such as resource management, cluster configuration, report viewing, job scheduling, etc., for the HPC cluster.

Add user rights. Users who are added have permission to submit jobs to the hpc cluster through the client application, cmd,api,powershell,webportal. It is usually recommended to use groups for management.

Configure Job Scheduler Policy

Configure job email notification to notify the job owner when the job starts and ends.

Create different levels of node groups

Create a job template, which will be displayed when the user submits the job reservation

Limit the running time of the job. If there is a time limit for the execution of a certain type of job, you can configure it here.

Set the priority of the job template. For all jobs created through this job template, the job priority defaults and the maximum adjustable range

Restrict node groups. Jobs created through this template can be configured to run only on those node groups.

You can also restrict permissions for job templates

Open the Web portal for the HPC cluster, enter the installation bin directory, and enter the following command. The script will automatically help us install IIS and configure the content required by the HPC portal.

If the configuration is correct, you can open HPC portal by accessing the header node name / hpcportal path. Authentication is required before opening the web page. Authenticated HPC users or administrators can submit jobs online on portal, which is only available in English.

Add Windows compute node

Windows Compute Node system requirements

Operating system: Windows Server 2016 Magic Windows Server 2012 R2 Magi Windows Server 2012 Magi Windows Server 2008 R2 SP1, support domains or workgroups

Copy the pfx version of the header node certificate with key to the compute node

Open the HPC pack installation wizard on the windows compute node and select to create a new compute node to join the existing HPC cluster

Enter the name of the header node where you join the cluster. If there are multiple header nodes, you can enter them separated by commas.

Add a certificate and enter the certificate password

The wizard automatically installs prerequisite components for compute nodes

After the installation is completed, the node can be seen in the head node cluster administrator, which is currently in an unapproved state. For security reasons in the HPC cluster, it is necessary to prevent maliciously added computing nodes from coming in to copy cluster data. Each added computing node must be approved by the administrator before joining the cluster.

The approval method can be done through the add node wizard or assign a node template. The concept of a node template refers to a set of specifications. For all added nodes, you need to select a node template to fit into the specification before you can join the HPC cluster normally.

After selecting the node template, HPC will automatically nest the specification for the computing node according to our settings.

When all the specifications and settings have been implemented and the node becomes online, the computing node can run the job normally.

We can configure what actions need to be performed when a new node is added by modifying the node template

Add a compute node to a node group

Microsoft HPC Cluster scheduling Model

The Microsoft HPC cluster scheduling model is divided into two concepts: job and task, in which job mainly refers to the reservation of high-performance cluster resources, which usually includes the number of processors requested, the node group required to run the job, and the scheduled running time.

Job scheduling algorithm

In the course of use, we can submit jobs to the HPC cluster through API,CMD,Powershell, portal and console, and the jobs will be recorded in the HPCScheduler database. At the same time, the jobs will be scheduled by the head node. Submitting the jobs is equivalent to booking a resource window, and the required tasks can be completed within the time the job includes. The HPC cluster head node is responsible for maintaining a job queue. If the scheduler finds that the existing resource can meet the minimum requirements of the first job in the queue, it will take it out and put it into operation, otherwise the scheduler will not do any work.

Job scheduling algorithm exception: when all running jobs indicate the scheduled run time, and the current free resources are not enough to meet the minimum requirements of the first job in the queue, the uppermost job can be found from the queue for scheduling (backfill), which must be satisfied.

The backfill job also indicates the scheduled run time

The backfill job can place the current free resources

Based on the currently running job, you can calculate when there will be enough resources for the first job in the queue to run, and the distance from now must be greater than the planned run time of backfill.

The task is to describe the specific work that needs to be done.

The HPC cluster supports submitting empty jobs and then attaching tasks to the job, but the task must depend on the job to run

Each task can only use the resources assigned to the job to complete its own work.

Although the job corresponds to a single program, some programs need to run in parallel across multiple nodes, and the minimum and maximum resource requirements need to be specified in the task definition.

The task can clearly indicate that it needs each node group, and if it has permission, it can also specify the exclusive node directly.

The task requires the startup address of the program at the computing node, the working directory (the working directory usually selects a storage server to map to the same network disk of each computing node to store program files), and the parameters needed to start the program.

The task definition can specify the redirection of standard input, input output and error streams during the running of the program.

Dependencies can be defined between tasks in the same job

Task scheduling algorithm

If the tasks on which a task depends have been executed or there are no dependent tasks, the task goes into an independent state (scheduler internal state)

A task queue is maintained in each job, and when there is a resource spare, the scheduler will select the top independent state and schedule the tasks that can be accommodated in the resource.

Microsoft HPC clustering feature support

Resource grouping is supported. Node groups are used by default, or resource pools are used for resource grouping.

Support cluster node status, performance, log monitoring

Support for job scheduling, task scheduling, manual control of tasks or job execution

Support workgroup nodes as computing power

Support to submit jobs through API/CMD/Powershell/Console/portal, etc.

Support for execution of MPI,MS-MPI is a Microsoft implementation of the messaging interface (MPI) developed for Windows, which allows MPI applications to run as tasks on the HPC cluster

Using the job template as the quality of service limit of the high performance computing cluster, you can define the priority of the jobs executed through this template, the processors, node groups, memory that can be used, and the permissions to apply for jobs using this job template. administrators can classify users who need different quality of service by creating different job templates, combining node groups and user groups

Using the node template as the baseline for adding nodes, configuring the node template to make the newly added nodes meet the baseline and then join the high-performance cluster, or by encapsulating the operating system driver software in the node template, the computing nodes can be batch booted and deployed through the head node PXE.

Integrate with other Microsoft components, use Exchange server as mail notification, use SCOM,OMS as cluster monitoring, use × × S/SSAS/PowerBI configuration analysis report, head node support WSFC deployment, database support Always On or WSFC deployment, compute node support Azure Paas or Iaas deployment, authorization can use AD or AzureAD, task working directory can be mapped to each settlement node NFS/SMB network disk, behind can come from S2D A traditional file server cluster or UNC path replicated by 2016 storage.

Job types that can be submitted by the Microsoft HPC cluster

MPI jobs: you can choose to create new jobs or single-task jobs to create MPI jobs. For tasks running MPI applications, task commands must begin with mpiexec: therefore, parallel task commands must be in the following format: mpiexec [mpi_options] [arguments], where myapp.exe is the name of the application to run, and for parallel tasks, Windows HPC Server 2008 begins to include MPI packages based on the Argonne National Laboratory MPICH2 standard. Microsoft's MPI implementation (called MS-MPI) includes the launcher mpiexec, MPI services for each node, and a software development kit (SDK) for developing user applications.

Parameter scan job:

A parameterized scan job consists of multiple instances of the same application, usually a serial application, running in parallel, with input provided by the input file and output pointing to the output file. The input and output are typically a set of index files (for example, input1,input2,input3..., output1,output2,output3...) that are set to reside in a single public folder or a separate public folder. There is no communication or interdependence between tasks. These tasks may or may not run in parallel, depending on the resources available on the cluster when the job runs.

Task flow job

In a task flow job, a different set of tasks runs in a specified order, usually because one task depends on the result of another task. Jobs can contain many tasks, some of which are parameterized, some serial, and some parallel. For example, you can create a task flow job that consists of MPI and parameterized tasks. You can determine the order in which tasks are run by defining dependencies between tasks.

The following figure illustrates the task flow job:

Task 1 runs first. Note that only tasks 2 and 3 can run in parallel because neither depends on the other. Task 4 runs after tasks 2 and 3 are completed.

SOA work assignment

Service-oriented architecture (SOA) is a method of building distributed, loosely coupled systems. In SOA systems, different computing functions are packaged into software modules called services. Services can be distributed across the network and accessed by other applications. For example, if the application performs repetitive parallel computing, the core computing can be packaged as a service and deployed to a cluster. This enables developers to solve embarrassing parallelism problems without having to rewrite low-level code and quickly extend the application. By allocating core computing among multiple service hosts (compute nodes), applications can run faster for end users to run applications on their computers, and cluster nodes perform calculations.

A client application provides an interface for the functionality of one or more services accessed by the end user. Developers can create clustered SOA client applications to provide access to services deployed to Windows HPC clusters. At the back end, the client application submits a job containing service tasks to the cluster, starts a session with the proxy node, and sends a service request and receives a response (the result of the calculation). According to the job scheduling strategy, the job scheduler on the head node allocates resources to service jobs. An instance of the service task runs and loads the SOA service on each allocated resource. The job scheduler attempts to adjust the resource allocation based on the number of service requests.

If the client creates a persistent session, the agent uses MSMQ to store all messages. The client can retrieve the reply stored by the agent at any time, even after intentionally or unintentionally disconnecting

The following figure illustrates how SOA jobs run on a Microsoft HPC cluster:

Microsoft Excel calculation offload

HPC Services for Excel, which is included in some versions of HPC Pack, supports many models for offloading Excel calculations to HPC Pack clusters. Workbooks for cluster acceleration include stand-alone computing that can run in parallel. Many complex and long-running workbooks run iteratively-that is, they perform a single calculation multiple times on different input datasets. These workbooks may contain complex Microsoft Visual Basic for Applications (VBA) functions or computationally intensive XLL add-ins. HPC Services for Excel supports uninstalling workbooks to the cluster or UDF to the cluster.

Microsoft Excel 2010 extends the UDF model to the cluster by enabling the Excel 2010 UDF to run in the Windows HPC cluster. When a supported cluster is available, users can instruct Excel 2010 to use the cluster by selecting the cluster connector and specifying the cluster name in the Advanced options of the Excel options dialog box. In a cluster, UDF works very similar to traditional UDF, except that the calculation is performed by one or more servers. The key benefit is parallelization. If the workbook contains calls to a long-running UDF, you can use multiple servers to evaluate the function at the same time. In order to run on a cluster, UDF must be included in a cluster secure XLL file

Types of tasks that can be submitted by the Microsoft HPC cluster

Basic: run a single instance of a serial application or a messaging interface (MPI) application. MPI applications usually run concurrently on multiple cores and can span multiple nodes

Parameter scan: usually runs commands (represented by start, end, and increment values) a specified number of times in index input and output files. The scanning steps may be parallel or out of sync, depending on the resources available on the cluster while the task is running.

Node preparation: even when nodes are dynamically added (growth policy), commands or scripts are run when each compute node is assigned to a job. The node prepares the task to run on the node before any other task that is working.

If the node preparation task cannot be run on the node, the node will not be added to the job.

Node release:

Even when a node is dynamically deleted (shrink policy), you can run a command or script when each node calculates each node because it is released from the job.

The maximum elapsed time (in seconds) for a node publish task is defined by a cluster administrator with NodeReleaseTaskTimeout cluster parameters. You cannot override this runtime limit. By default, the timeout is set to 15 seconds. To view the value of this cluster parameter, run the following command (search string is case sensitive): cluscfg listparams | find "NodeReleaseTaskTimeout"

The node publishing task runs when the job is cancelled. The node publish task does not run when the job is forcibly cancelled.

If the job has a maximum run time and a node publish task, the Job Scheduler cancels other tasks in the job before the job's run time expires (the job run time minus the node publish task run time). This allows the node publishing task to run within the assigned job time.

Services:

Run commands or services on all resources assigned to the job. When a new resource is added to a job, or if a previously running instance exits and the running resource is still assigned to the job, the new instance of the command runs. The service task will continue to start the instance until the job is cancelled or stopped, the maximum elapsed time expires, or the maximum number of instances (subtasks) is reached.

Tasks submitted through a service-oriented architecture (SOA) client run as service tasks.

You cannot add basic tasks or parameter scan tasks to jobs that contain service tasks

After defining a job or task, you can export the job or task specification to a XML file, and next time you can import the job or task directly from XML

Microsoft HPC Cluster Job and Task status

(it is actually displayed in Chinese, and it is more accurate to explain the name in English.)

Configuring: the job or task is in the system but has not been submitted to the queue.

Submitted: the job or task has been submitted and is waiting for verification before queuing

The Validating:HPC Job Scheduler service is validating a job or task. During validation, the HPC Job Scheduler service confirms permissions, applies default settings for any properties that are not specified by the job owner, and validates each property against constraints. The default settings and constraints are defined by the job template. The HPC Job Scheduler service also confirms that the job property contains all task properties (for example, the value of the run time of no task is greater than the run time of the job).

During validation, the job may also submit the filter application through a custom defined by the cluster administrator.

If the job is validated, it is moved to the queue state. If the job fails validation, the job displays an error message and the job goes to a failed state.

Queued: a job or task has been validated and is waiting to be scheduled and activated (run). When a running job, basic task, or parameter scanning subtask is preempted by the HPC Job Scheduler service, it will be moved back to the queue state (unless the task cannot be rerun, in which case it will be marked as failed).

Dispatching: this state applies only to tasks. The HPC Job Scheduler service has assigned resources to the task and is contacting the assigned node to start running the task. When the task starts, it goes to the running state.

Running: a job or task is running on one or more nodes

Finishing: work or task completed, work or task cleanup in progress

Finished: the work or task has been completed successfully.

Failed: the work or task has been completed successfully. The job or task failed to complete, stop running or return an exit code indicating failure (non-zero exit code by default). In addition, running tasks are marked as failed in the following cases:

The job owner or cluster administrator cancels the task.

The HPC Job Scheduler service cancelled a task because it exceeded its maximum elapsed time.

The HPC Job Scheduler service preempts tasks that are not marked as rerunable.

The HPC Job Scheduler service preemptively starts subtasks (node preparation, node release, and service subtasks) that start on a per-resource basis.

If a job or task cannot be started due to a cluster failure, the job or task automatically retries the specified number of times before it is marked as failed.

Canceling: work or task has been cancelled and cleanup is in progress

Canceled: the job has been cancelled by the job owner, cluster administrator, or HPC Job Scheduler service. For example, the HPC Job Scheduler service can cancel a job if it exceeds its runtime or if the job is preempted. The job owner or cluster administrator canceled the task before starting the run. If a running task is cancelled, the task is marked as failed.

Users submit jobs through the client

HPC pack 2016 client installation requirements

Operating system: Windows Server 2016 Magic Windows Server 2012 R2 Magi Windows Server 2012 Magi Windows Server 2008 R2 SP1,Windows 10 Magi Windows 8.1 Magi 8 Magi 7 SP1, support to join domain clients or workgroups

Net framework: version 4.6.1 or higher is required. Windows Server 2016 is included with Windows 10. Other operating systems need to be installed separately. There is an installation package in the HPC pack installation file DotNetFramework directory. KB2919442,KB2919355 patches need to be installed sequentially before win8,2012R2 installation. Win7sp1,2008R2sp1 does not need to install patches.

Copy the installation package to the client, or put it in the network path, open the installation wizard, and choose to install only the client utility

After the installation is complete, you can see hpc powershell, hpc Cluster Administrator, hpc job manager on the client side, and click job manager to connect to the header node.

Opens the job management interface, where users can submit jobs to a high-performance cluster

Permissions are determined by the currently logged in user, or the runas user

After testing by Lao Wang, at present, if the client of HPC pack wants to connect to the server, it needs to join the domain. If the client does not join the domain, opening the job manager connection to the head node will not succeed. If you really do not want every job client to join the domain, you can consider using a Server as a gateway node, installing RDSH and HPC pack clients, and publishing the hpc pack console on this machine to the remoteapp or xenapp solution as app. Or the workgroup client directly uses portal,cmd,powershell,api to submit the job.

The space of this article is limited. Lao Wang only does a brick to attract jade here, and does not create a demonstration of each assignment for you. Interested friends can refer to the link and create their own homework and tasks.

Https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/hpc-server-2012-R2-and-2012/ff919653(v%3dws.11)

HPC Cluster Administrator utility features

Select nodes in batches, run commands, and output results according to nodes

Batch selection of nodes to run diagnostics, including self-contained test items such as SOA,MPI,Azure nodes

Summary: through this article, Lao Wang introduces the concept of high-performance computing, Microsoft's solutions for high-performance computing, product introduction, concept, cluster creation, and cluster common settings. Lao Wang believes that although Microsoft's HPC cluster may not have the best performance in the industry, it also has its advantages, for example, it is closely integrated with its own products and does not need to install multiple functions to achieve HPC. You only need to install a hpc pack. The functions of HPC are relatively complete. If you are not good at linux commands, you can also try to use Microsoft GUI to build a HPC cluster, hoping to bring benefits to interested friends. If you have any questions, you are welcome to communicate with Lao Wang.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.