Post

slurm作业调度系统的安装

slurm作业调度系统的安装

wsl内单节点安装

参考:Ubuntu20.04单机安装slurm教程

笔者wsl的系统是ubuntu22.04,直接用apt进行安装:

1
sudo apt install slurm-wlm slurm-wlm-doc

检查版本:

1
2
3
(rdkitenv) root@user:~# slurmd --version
slurm-wlm 21.08.5
(rdkitenv) root@user:~# 

/etc/slurm里建立slurm.conf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
ClusterName=cool
ControlMachine=[user]

MailProg=/usr/bin/s-nail
SlurmUser=root
SlurmctldPort=6817

SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=0

# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0

# SCHEDULING
SchedulerType=sched/backfill

# LOGGING

SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
JobCompType=jobcomp/none

# COMPUTE NODES

PartitionName=[user] Nodes=[user] Default=NO MaxTime=INFINITE State=UP
#NodeName=[user] State=UNKNOWN
NodeName=[user] Sockets=[Sockets] CoresPerSocket=[cpus] ThreadsPerCore=[tpc] State=UNKNOWN
  • [user]:hostname的输出
  • [Sockets]:cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l的输出
  • [cpus]:cat /proc/cpuinfo| grep "cpu cores"| uniq的输出
  • [tpc]:执行如下脚本的输出:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    #!/bin/bash
    cpunum=`cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l`
    echo "CPU 个数: $cpunum";
    cpuhx=`cat /proc/cpuinfo | grep "cores" | uniq | awk -F":" '{print $2}'`
    echo "CPU 核心数:$cpuhx" ; 
    cpuxc=`cat /proc/cpuinfo | grep "processor" | wc -l`
    echo "CPU 线程数:$cpuxc" ;
      
    if [[ `expr $cpunum\*$[cpuhx*2] ` -eq $cpuxc ]];
    then
        echo "2"
    else
        echo "1"
    fi
    

编辑完成后启动systemctl服务

1
2
sudo systemctl enable slurmctld --now
sudo systemctl enable slurmd --now

查看sinfo:

1
sinfo

看到自己的节点是idie,说明安装已经成功。

This post is licensed under CC BY 4.0 by the author.
Total hits!