Slurm Resource Management and Job Priority
How Slurm decides which jobs run first — priority factors, fair-share scheduling, backfill, and monitoring commands (squeue, sinfo, sacct).
6 min readConcept
Explore machine learning concepts related to resource-management. Clear explanations and practical insights.
How Slurm decides which jobs run first — priority factors, fair-share scheduling, backfill, and monitoring commands (squeue, sinfo, sacct).
How Slurm tracks resource consumption through account hierarchies, TRES billing, and resource limits — sacctmgr, sreport, and the association model explained.
Master cgroups to limit CPU, memory, and I/O for process groups. Understand cgroups v1 vs v2, the hierarchical structure, and how containers use them.