In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to install slurm on Linux system to monitor network bandwidth and control nodes". The explanation in this article is simple and clear and easy to learn and understand. Please follow Xiaobian's ideas to study and learn "how to install slurm on Linux systems to monitor network bandwidth and control nodes".
SLURM is an open source distributed resource management software similar to Sun Grid Engine (SGE), which is highly scalable and fault-tolerant for supercomputers and large computing node clusters. After SUN was sold to Oracle, the useful SGE became Oracle Grid Engine and became commercial software since the 6.2u6 version (it can be used for free for 90 days), so we had to look for other open source alternatives. SLURM was introduced by a stranger at the last high-performance conference in Durban, which sounds good.
SLURM manages cluster computing nodes through a pair of redundant cluster control nodes (redundancy is optional), which is implemented by a management daemon called slurmctld. Slurmctld provides monitoring, allocation and management of computing resources, and maps and distributes incoming job sequences to each computing node. Each computing node also has a daemon slurmd,slurmd that manages the nodes running on it, monitors the tasks running on the node, accepts requests and work from the control node, maps work to the interior of the node, and so on. The figure is as follows:
Monitoring bandwidth
The code is as follows:
$apt-get install slurm
It uses characters to display text graphics.
For example:
The code is as follows:
$slurm-I
$slurm-I eth2
Option
Press l to display the lx/tx indicator.
Press c to switch to classic mode.
Press r to refresh the screen.
Press Q to exit.
Control node
Install the slurm package at the control node and the computing node respectively. This package contains both the slurmctld and the slurmd required by the control node:
The code is as follows:
# apt-get install slurm-llnl
Communication between the control node and the computing node requires authentication. Slurm supports two authentication methods: authd of Brent Chun's and MUNGE,MUNGE of LLNL are specially built for high-performance cluster computing. Here, we choose MUNGE to start the munge authentication service after generating key:
The code is as follows:
# / usr/sbin/create-munge-key
Generating a pseudo-random key using / dev/urandom completed.
# / etc/init.d/munge start
Use the SLURM Version 2.3 Configuration Tool online configuration tool to generate a configuration file, and then copy the configuration file to the control node and the / etc/slurm-llnl/slurm.conf of each compute node (yes, the control node and the compute node use the same profile).
Once you have the configuration file and start the munge service, you can start the slurmctld service on the control node:
The code is as follows:
# / etc/init.d/slurm-llnl start
* Starting slurm central management daemon slurmctld [OK]
Copy the munge.key generated by the control node to each computing node:
The code is as follows:
# scp / etc/munge/munge.key ubuntu@slurm01:/etc/munge/
Start the munge service after logging in to the compute node (note that you need to change the owner and group of munge.key to munge, otherwise it will fail) and slurmd service:
The code is as follows:
# ssh ubuntu@slurm01
# chown munge:munge munge.key
# / etc/init.d/munge start
* Starting MUNGE munged [OK]
# slurmd
Test the connection to the compute node (slurm01) on the control node (slurm00), and simply run a program / bin/hostname to see how it works:
The code is as follows:
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
Debug* up infinite 1 idle slurm01
# srun-N1 / bin/hostname
Slurm01
Thank you for your reading, the above is the content of "how to install slurm on the Linux system to monitor network bandwidth and control nodes". After the study of this article, I believe you have a deeper understanding of how to install slurm on the Linux system to monitor network bandwidth and control nodes. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.