In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "the tutorial of building Slurm cluster". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn the "Slurm cluster building tutorial" it!
1. System environment a) Centos 7.8b) the two machine planning IP is 10.10.0.20 and 10.10.0.21c respectively) the machine hostname is node1, node2d) both have turned off the firewall and selinuxe) have been configured with SSH secret-free, NIS, NFS II, Configure munge and Slurma) installation depends on yum install-y epel-release yum install-y gtk2 gtk2-devel munge munge-devel python python3b) configure munge # both machines need to be configured with chown slurm:slurm / etc/munge chown slurm:slurm / var/run/munge chown slurm:slurm / var/lib/munge chown slurm:slurm / var/log/munge create-munge-key # this step is only available in the node1 section Click to scp / etc/munge/munge.key node2:/etc/munge/ chown slurm:slurm / etc/munge/munge.key su-slurm # all nodes use slurm user to launch munged mungedc) configure Slurm # all nodes configure the same tar xvf slurm-20.11.5.tar.bz2 cd slurm-20.11.5/. / configure make-j3 Make install-j3 cp etc/ {slurmctld.service Slurmdbd.service Slurmd.service} / usr/lib/systemd/system vi / usr/local/etc/slurm.conf # configuration file is attached to PS scp / usr/local/etc/slurm.conf node2:/usr/local/etc/ chown slurm:slurm / var/spool/ systemctl start slurmctld # master node starts slurmctld, slurmd systemctl start slurmd # slave node only needs to start slurmd. Test a) the number of nodes executed by the system command test [root [@ node1] (https://my.oschina.net/u/4273199) ~] # sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST control up infinite 1 idle node1 compute* up infinite 2 idle node [1-2] [root [@ node1] (https://my.oschina.net/u/4273199) ~] # srun-N2-l hostname #-N2) 0: node1 1: node2 so far the Slurm cluster has been built b) mpi test vi test.c # test program mpicc-o test test.c # compile mpi program vi tj.sh # job script in PS sbatch tj.sh # submit job squeue # check job status [root [@ node1] (https://my) .oschina.net / u cat test.out 4273199) # cat test.out # View the job results node2: Hello world from process 2 number of processes: 4. Node1: Hello world from process 0 node2: Hello world from process 3 node1: Hello world from process 1 4. PSa) tj.sh job script #! / bin/sh # SBATCH-o / root/test.out # results output to test.out # SBATCH-- number of nodes=2 # nodes: 2 # SBATCH-- ntasks-per-node=2 mpirun / root/testb) slurm.conf configuration file SlurmctldHost=node1 # Master node MpiDefault=none ProctrackType=proctrack/pgid # modify via website configuration ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm # slurm Administrative user StateSaveLocation=/var/spool SwitchType=switch/none TaskPlugin=task/affinity InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core AccountingStorageType=accounting_storage/none AccountingStoreJobComment=YES ClusterName=siton # Cluster name JobCompType=jobcomp/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=info SlurmdDebug=info NodeName=node1 Node2 CPUs=4 RealMemory=2 Sockets=4 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN / * Node name Number of CPUs cores, corepersocket,threadspersocket, check using lscpu, realmemory is actually allocated to slurm memory, procs is the actual number of CPU, state=unknown is unknown when you start the cluster in / proc/cpuinfo, and then it becomes idle*/ PartitionName=control Nodes=node1 Default=YES MaxTime=INFINITE State=UP PartitionName=compute Nodes=node1,node2 Default=YES MaxTime=INFINITE State=UP / * partitionname is divided into control and compute Default=yes means that this is used to calculate * / can generate configuration files through https://slurm.schedmd.com/configurator.html c) mpi test program # include # include int main (int argc, char* argv []) {int myid, numprocs, namelen Char processor_ name [MPI _ MAX_PROCESSOR_NAME]; MPI_Init (& argc, & argv); MPI_Comm_rank (MPI_COMM_WORLD, & myid); MPI_Comm_size (MPI_COMM_WORLD, & numprocs); MPI_Get_processor_name (processor_name, & namelen); if (myid = = 0) printf ("number of processes:% d\ n...", numprocs) Printf ("% s: Hello world from process% d\ n", processor_name, myid); MPI_Finalize (); return 0 5. Advanced (GPU) modify slurm.conf file GresTypes=gpuNodeName=slave3 Sockets=2 Procs=32 CoresPerSocket=8 ThreadsPerCore=2RealMemory=3000 Gres=gpu:tesla:2 State=UNKNOWN NodeAddr=10.135.12.29 in addition, slave3 this machine needs to configure GPU information, edit / usr/local/etc/gres.conf file Name=gpu Type=tesla File=/dev/nvidia0Name=gpu Type=tesla File=/dev/nvidia1slurm script add gres specify gpu resource # SBATCH-- gres=gpu:tesla:2 here, I believe you have a deeper understanding of the "Slurm cluster building tutorial" You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.