slurm作业调度系统的安装
slurm作业调度系统的安装
wsl内单节点安装
笔者wsl的系统是ubuntu22.04,直接用apt进行安装:
1
sudo apt install slurm-wlm slurm-wlm-doc
检查版本:
1
2
3
(rdkitenv) root@user:~# slurmd --version
slurm-wlm 21.08.5
(rdkitenv) root@user:~#
在/etc/slurm
里建立slurm.conf
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
ClusterName=cool
ControlMachine=[user]
MailProg=/usr/bin/s-nail
SlurmUser=root
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=0
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
# SCHEDULING
SchedulerType=sched/backfill
# LOGGING
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
JobCompType=jobcomp/none
# COMPUTE NODES
PartitionName=[user] Nodes=[user] Default=NO MaxTime=INFINITE State=UP
#NodeName=[user] State=UNKNOWN
NodeName=[user] Sockets=[Sockets] CoresPerSocket=[cpus] ThreadsPerCore=[tpc] State=UNKNOWN
- [user]:
hostname
的输出 - [Sockets]:
cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l
的输出 - [cpus]:
cat /proc/cpuinfo| grep "cpu cores"| uniq
的输出 - [tpc]:执行如下脚本的输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
#!/bin/bash cpunum=`cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l` echo "CPU 个数: $cpunum"; cpuhx=`cat /proc/cpuinfo | grep "cores" | uniq | awk -F":" '{print $2}'` echo "CPU 核心数:$cpuhx" ; cpuxc=`cat /proc/cpuinfo | grep "processor" | wc -l` echo "CPU 线程数:$cpuxc" ; if [[ `expr $cpunum\*$[cpuhx*2] ` -eq $cpuxc ]]; then echo "2" else echo "1" fi
编辑完成后启动systemctl服务
1
2
sudo systemctl enable slurmctld --now
sudo systemctl enable slurmd --now
查看sinfo:
1
sinfo
看到自己的节点是idie,说明安装已经成功。
This post is licensed under CC BY 4.0 by the author.