Navigation
|
MonitoringCount of nodes by SLURM state: sum should equal the number of compute nodes; anything with "DRAIN" in the name = bad (see below) RunbookTroubleshoot the SLURM scheduler
Undrain all nodesfor node in $(seq -w 10 12); do \ scontrol update NodeName=ecpsc$node State=RESUME; \ done for fastnode in $(seq 10 11); do \ scontrol update NodeName=ecpsf$fastnode State=RESUME; \ done scontrol show nodes|grep State # Should show no DRAINED state Nodes still drained / draining by themselves?💡There is a dashboard for that. Take a look at /var/log/slurm/slurmctl.log to find out why. Common causes include
|