Slurm real GRES on a real GPU (Lesson 6, Part D)¶
Get the code to run this lab
The commands on this page come from the repository, not the website. Clone it and enter this lesson's folder: git clone https://github.com/ld-singh/ai-factory-ops-lab && cd ai-factory-ops-lab/portfolio-lab/02-slurm-gpu-platform/slurm-realgpu. Browse this lesson on GitHub
Part of Lesson 6 - Real GPU ยท The simulation counterpart is Lesson 2 - Slurm (fake GRES) ยท Course home: AI Factory Operations Lab
๐ง STATUS: PLANNED โ coming in a future update (optional). The scheduling half of Slurm GRES is fully validated in Lesson 2 (fake GRES); this real-hardware enforcement half is an optional add-on planned for a later update. The guide below is already run-ready for when it lands: it's ordered notes with pointers to the official Slurm docs rather than a copy-paste script, because the directives depend on your Slurm version and host. Confirm every directive against https://slurm.schedmd.com/gres.html before running, and record your output as evidence.
The boundary (read first)¶
The fake-GRES Slurm lesson proved the scheduling decision:
slurmctld counts gpu GRES, places jobs, rejects impossible requests, and applies
QoS - none of which needs a real device. This page proves the part it cannot: real
--gres=gpu enforcement - that a job step is actually confined to the GPU(s) Slurm
allocated it, via the cgroup device controller, so a process cannot touch a GPU it
wasn't given.
| Claim | Where it's proven |
|---|---|
GRES counting, placement, QOSMaxGRESPerUser, pending reasons |
โ Fake GRES (Lesson 2) - control-plane logic |
CUDA_VISIBLE_DEVICES set to the allocated devices, and the job confined to them |
๐ฅ Here - cgroup device enforcement on a real GPU |
It does not prove multi-node GRES, NVLink/topology-aware allocation, or GPU-sharding - single node by design, same as the rest of Lesson 6.
Prerequisites¶
The same rented host as the rest of Lesson 6, after
Phase 0 (driver + a working nvidia-smi). You do not need Kubernetes for this
phase - Slurm talks to the GPU through the driver and cgroups directly. You need root
to install Slurm and configure cgroups.
What you build¶
A single-node Slurm install on the GPU host where gres.conf points at the real
GPU device files and cgroup.conf enables device confinement, then a one-GPU job that
proves it only sees its allocation.
The lab's Docker-based fake-GRES cluster is not the vehicle here - it deliberately uses fake char-devices and
task/none. For real enforcement you want Slurm on the host with the real driver and the cgroup device plugin. Treat the lab'sconfig/files as the shape to adapt, not drop-in files.
The steps (confirm each against the Slurm docs)¶
- Install Slurm on the host (your distro's package or a build). Reference: https://slurm.schedmd.com/quickstart_admin.html
- Declare the real GPU as GRES. In
gres.conf, pointFile=at the real device node(s) (e.g./dev/nvidia0), not the fake char-devices the sim lessonmknods. Inslurm.conf, setGresTypes=gpuand the node'sGres=gpu:<count>. Reference: https://slurm.schedmd.com/gres.html - Enable cgroup device confinement. Configure
cgroup.confwith the device constraint enabled (e.g.ConstrainDevices=yes) and the matchingProctrackType/TaskPluginfor your Slurm version - this is the piece that turns the GRES count into actual isolation. Reference: https://slurm.schedmd.com/cgroup.conf.html - Restart slurmctld/slurmd and confirm the node registers the GPU:
sinfo -o "%n %G"should showgpu:<count>. - Submit a one-GPU job that prints what it can see:
# ILLUSTRATIVE - confirm flags against the Slurm docs for your version.
srun --gres=gpu:1 bash -c 'echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"; nvidia-smi -L'
What to observe, and how to state it¶
- Allocation visibility.
CUDA_VISIBLE_DEVICESinside the job is set to the device(s) Slurm allocated - and on a multi-GPU host, a--gres=gpu:1job sees exactly one, not all. (On a single-GPU rental, the strong form of this is the confinement in point 2.) - Enforcement. With
ConstrainDevices=yes, a process in the job cannot access a GPU outside its allocation - the cgroup device controller blocks it. Demonstrate the behavior; the precise error depends on driver/runtime. The claim is "the job is confined to its allocated device," not a specific errno.
Record both - the in-job CUDA_VISIBLE_DEVICES/nvidia-smi -L, and your gres.conf
+ cgroup.conf - into the real-enforcement section of
slurm-gres-validation.md,
kept separate from the fake-GRES scheduling evidence (they back different claims).
๐ Related runbook: slurm-job-pending-reason-gres.md.
Relationship to the simulation lesson¶
Lesson 2 (fake GRES) proved Slurm schedules GPU jobs correctly, for free, with no GPU. This phase proves Slurm confines a placed job to its allocation, on one real GPU. Together they cover both halves of Slurm GRES; neither covers the other's, which is the whole point of keeping them apart - the same decision-vs-enforcement line HAMi draws in Lesson 1C.