Basic Analysis of Linux Cluster in HPC Architecture 07/12 Update SLTechnology News&Howtos

Basic Analysis of Linux Cluster in HPC Architecture

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you the basic analysis of Linux cluster in HPC architecture. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

Nowadays, Linux clustering has become very popular in many areas. With the advent of clustering technology and the increasing adoption of open source software, it is now possible to build a supercomputer at a fraction of the cost of traditional high-performance machines.

Using the concept of high performance computing (HPC) of Linux cluster technology, this paper shows how to build a cluster and write parallel programs. The types and uses of clusters, the basis of HPC, the role of Linux in HPC and the reasons for the growing cluster technology are discussed. Part 2 introduces the knowledge of parallel algorithms, how to write parallel programs, how to build clusters, and how to benchmark.

Types of HPC architecture

Most HPC systems use the concept of parallelism. There are many software platforms that are HPC-oriented, but first let's take a look at the hardware.

HPC hardware can be divided into three categories:

Symmetrical multiprocessor (SMP)

Vector processor

Cluster

Symmetrical multiprocessor (SMP)

SMP is one of the architectures adopted by HPC, in which multiple processors share memory. (in clusters, this is also known as massively parallel processors (massively parallel processor,MPP), and they do not require shared memory; we will cover this in more detail later.) SMP is generally more expensive and less scalable than MPP.

Vector processor

As the name implies, in vector processors, CPU is optimized to handle vector array operations well. Vector processor systems have high performance and were once dominant in the HPC architecture in the 1980s and early 1990s, but in recent years, clustering has become more popular.

Cluster

Cluster is the most important HPC hardware in recent years: cluster is a collection of MPP. A processor in a cluster is often called a node. It has its own CPU, memory, operating system, Imax O subsystem, and can communicate with other nodes. There are many places where common workstations are running Linux and other open source software to act as nodes in the cluster.

Next you'll see the differences between these HPC hardware, but let's start with clustering.

Cluster definition

The term "cluster" may mean different things in different places. This article focuses on the following types of clusters:

Failover cluster

Load balancing cluster

High performance cluster

Failover cluster

The simplest failover cluster has two nodes: one node is active and the other is standby, but it monitors the active node all the time. Once the active node fails, the standby node takes over its work, which enables the critical system to work continuously.

Load balancing cluster

Load balancing clusters are usually used on very busy Web sites, where they have multiple nodes to undertake the work of the same site, and each new request for a Web page is dynamically routed to a node with a lower load.

High performance cluster

High-performance clusters are used to run time-sensitive parallel programs, which are of special significance to the scientific community. High-performance clusters usually run simulations and other programs that are very sensitive to CPU, which take a lot of time to run on ordinary hardware.

Figure 1 illustrates a basic cluster. Part 2 of this series will show how to create such a cluster and write programs for it.

Figure 1. Basic cluster

Grid computing is a broader term commonly used to represent the use of collaboration between loosely coupled systems to implement a service-oriented architecture (SOA). Cluster-based HPC is a special case of grid computing, in which nodes are tightly coupled. One successful and well-known project of grid computing is SETI@home, the search for extraterrestrial intelligence project, which uses the idle CPU cycles of about 1 million household PC screensavers to analyze data from radio telescopes. Another similar successful project is the Folding@Home project, which is used to calculate the folding of proteins.

Common uses of high-performance clusters

Almost all industries need fast processing power. With the emergence of cheaper and faster computers, more and more companies have shown interest in taking advantage of these technologies. There is no upper limit to the demand for computing power, and although the processing capacity is increasing rapidly, the demand is still beyond what computing power can provide.

Life science research

Protein molecules are very complex chains that can actually be represented as innumerable 3D graphics. In fact, when proteins are put into a solution, they quickly "fold" into their own natural state. Incorrect folding can lead to many diseases, such as Alzheimer's disease; therefore, the study of protein folding is very important.

One way scientists try to understand protein folding is by simulating it on a computer. In fact, protein folding occurs very quickly (it may take only a microsecond), but the process is so complex that the simulation can take 10 years to run on an ordinary computer. This field is only a small one of many industry areas, but it requires very strong computing power.

Other areas in the industry include pharmaceutical modeling, virtual surgical training, environment and diagnostic virtualization, a complete database of medical records, and the Human Genome Project.

Oil and gas exploration

The seismogram contains detailed information about the internal characteristics of the continent and ocean floor, and the analysis of these data can help us explore oil and other resources. Even for a small area, there is a lot of TB data that needs to be refactored; this kind of analysis obviously requires a lot of computing power. The demand for computing power in this field is so strong that supercomputers are mostly dealing with this kind of work.

Other geographical studies require similar computing power, such as systems used to predict earthquakes and multispectral satellite imaging systems for security work.

Image presentation

Manipulating high-resolution interactive images in engineering areas (such as aerospace engine design) has always been a challenge in terms of performance and scalability because it involves a large amount of data. Cluster-based technology has been successful in these fields. they divide the task of rendering screen to each node in the cluster, and each node uses its own graphics hardware to present its own image of this part of the screen. and the pixel information is transmitted to a master node, and the master node combines the information to form a complete image.

Examples in this field are currently just the tip of the iceberg; more applications, including astrophysical simulation, meteorological simulation, engineering design, financial modeling, securities simulation and movie stunts, require rich computing resources. We will not introduce the increasing demand for computing power.

How Linux and clustering change HPC

Before the advent of cluster-based computing technology, typical supercomputers were vector processors, and because they all used dedicated hardware and software, the cost was usually more than $1 million.

This has changed a lot with the advent of Linux and other free cluster open source software components and improved processing power of common hardware. You can use a small amount of cost to build a powerful cluster and be able to add other nodes as needed.

The GNU/Linux operating system (Linux) has been widely adopted in clusters. Linux can run on a lot of hardware, and high-quality compilers and other software, such as parallel file systems and MPI implementations, are free on Linux. With Linux, users can also customize the kernel for their own workload. Linux is a very good platform for building HPC clusters.

Understanding hardware: vector machines and clusters

To understand HPC hardware, it is useful to compare vector computing with cluster computing. The two are competing technologies (the Earth Simulator is a vector supercomputer that is still one of the 10 fastest machines).

Fundamentally, both vector processors and scalar processors execute instructions based on clock cycles; what makes them different is the ability of vector processors to process vector-related computing in parallel (such as matrix multiplication). This is very common in high-performance computing. To demonstrate this, suppose you have two double arrays an and b, and you want to create a third array x, such as x [I] = a [I] + b [I].

Any floating-point operation, such as addition and multiplication, can be achieved in several steps:

Make an exponential adjustment

Add symbol

Rounding and checking the results, etc.

The vector processor processes these steps in parallel internally by using pipelined (pipeline) technology. Suppose there are six steps in a floating-point addition operation (like IEEE arithmetic hardware), as shown in figure 2:

Figure 2. Six-stage pipeline in IEEE arithmetic hardware

The vector processor can process these six steps in parallel-- if the I array element is added in step 4, then the vector processor performs step 3 for element 1, step 2 for element 2, and so on. As you can see, for a 6-level floating-point addition, the speedup is very close to 6 (not all six steps are active at the beginning and end). Because these steps are active at any given time (the red shown in figure 2). A big advantage of this is that parallel processing is done behind the scenes, and you don't need to code explicitly in the program.

In most cases, these six steps can be performed in parallel, resulting in an almost six-fold performance improvement. The arrow represents the action taken on the I-th array element.

Compared with vector processing, cluster-based computing uses a completely different approach. It does not use specially optimized vector hardware, but uses standard scalar processors, but it uses a large number of processors to process multiple computing tasks in parallel.

The characteristics of the cluster are as follows:

Clusters are built using common hardware at a fraction of the cost of vector processors. In many cases, the price will be more than an order of magnitude lower.

The cluster uses a messaging system for communication, and programs must be explicitly coded to use distributed hardware.

With clustering, you can add nodes to the cluster as needed.

Open source software components and Linux reduce the cost of software.

The maintenance cost of clusters is very low (they take up less space, consume less power, and have a lower demand for cooling conditions).

Parallel programming and Amdahl rule

When implementing a high-performance environment on a cluster, software and hardware need to work together. Programs must be written to explicitly take advantage of the underlying hardware, and if existing non-parallel programs do not run well on the cluster, they must be rewritten.

Parallel programs perform a lot of operations at a time. The number depends on the problem being solved. Suppose that one of the time spent by a program cannot be processed in parallel, then the rest (1-1 impulse N) is the part that can be processed in parallel (see figure 3).

Figure 3. Amdahl rule

In theory, you can use countless pieces of hardware to handle parallel execution, even in close to zero time, but for the serial part, this will not make any improvement. As a result, the achievable * * result is to execute the entire program in the original 1max N time, but it is impossible to go any faster. In parallel programming, this fact is often referred to as the Amdahl rule.

Amdahl's law reveals the speedup ratio between using parallel processors to solve problems and using only one serial processor to solve problems. The speedup (speedup) is defined as the time required for parallel execution of programs (using multiple processors) divided by the time required for serial execution of programs (using one processor):

T (1)

S =-

T (j)

Where T (j) is the time required to execute the program using j processors.

In figure 3, if enough nodes are used for parallel processing, the T'par can be very close to 0, but the Tseq will not change. In the case of * *, parallel programs cannot be as fast as the original 1+Tpar/Tseq.

The really difficult thing when writing parallel programs is to make N as large as possible. But there are two sides to this matter. It is usually an attempt to solve a larger problem on a more powerful computer, usually as the size of the problem increases (such as trying to modify the program and improve parallelism to optimize the use of available resources). The time spent on the serial part will be reduced. As a result, the N value automatically increases. (see the corollary of Amdhal's rule in the Resources section later in this article.)

The method of parallel programming

Now let's introduce two parallel programming methods: the distributed memory method and the shared memory method.

Distributed memory method

It is useful to consider a master-slave mode here:

The master node is responsible for dividing the task into multiple slave nodes.

The slave node is responsible for handling the tasks received by itself.

If necessary, the slave nodes communicate with each other.

The slave node returns the result to the master node.

The primary node collects the results and continues to distribute tasks, and so on.

Obviously, the problem with this approach arises from the organization of distributed memory. Because each node can only access its own memory, if other nodes need to access the data in memory, these data structures must be copied and transmitted over the network, which will lead to a large amount of network load. To write effective programs with distributed memory, you must keep this weakness in mind and the master-slave model.

Shared memory method

In the shared memory approach, memory is common to all processors, such as SMP. This approach does not have the problems mentioned in the distributed memory approach. And it's much easier to program such a system, because all the data is available to all processors, which is not much different from serial programs. A big problem with these systems is scalability: it is not easy to add other processors.

Parallel programming (like all programming techniques), like other sciences, is an art that always leaves room for design improvement and performance improvement. Parallel programming has its own special place in computing: part 2 of this series will introduce the parallel programming platform and give several examples.

What should I do when the file IPUBO becomes a bottleneck?

Some applications usually need to read and write large amounts of data from disk, which is usually the slowest step in the whole computing process. Faster hard drives can help solve some problems, but sometimes it's not enough.

This problem becomes more obvious if a physical disk partition is shared among all nodes (for example, using NFS), as is often the case in Linux clusters. This is the opportunity to use the power of parallel file systems.

Parallel file system (Parallel filesystem) stores data in files distributed on multiple disks, which are connected to multiple nodes in the cluster. These nodes are called Icano nodes. When a program tries to read a file, it can read parts of the file from multiple disks in parallel. This reduces the load on a disk controller and is able to handle more requests. PVFS is a good open source parallel file system; more than 1 GB/s of disk performance has been achieved using standard IDE hard drives on Linux clusters.

PVFS can be used as a Linux kernel module or compiled into the Linux kernel. The underlying concept is very simple (see figure 4):

The metadata server is responsible for storing information about which parts of the file are stored and where.

Parts of the file are stored on multiple PVFS O nodes (any common file system, such as ext3, can be used at the bottom of the file).

Figure 4. How PVFS works

The above is the basic analysis of the Linux cluster shared by the editor in the HPC architecture. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.