Lesson 1 · Deep dive - How a fake GPU fleet works¶
Get the code to run this lab
The commands on this page come from the repository, not the website. Clone it and enter this lesson's folder: git clone https://github.com/ld-singh/ai-factory-ops-lab && cd ai-factory-ops-lab/portfolio-lab/01-k8s-gpu-platform/kwok. Browse this lesson on GitHub
Part of Lesson 1 - Kubernetes GPU Scheduling. Read this when Step 1 makes you ask "wait, how is this legitimate?"
🎯 Objective: understand exactly what KWOK (Kubernetes WithOut Kubelet) fakes, what it does not fake, and how the fake GPU nodes are constructed - so you can defend the simulation's validity and name its limits precisely.
KWOK lets us register fake nodes that the scheduler treats as real: pure API
objects with no kubelet. KWOK provides the nodes; the fake-gpu-operator then
advertises nvidia.com/gpu onto them (see
fake-gpu-operator/README.md). We also give the
nodes the same gpu-pool / product labels GPU Feature Discovery would apply on a
real cluster, and the control plane behaves exactly as it would against a real GPU
fleet. KWOK and the operator are complementary: KWOK = nodes, operator = the GPU
layer on those nodes.
Why this is legitimate (and where it stops)¶
💡 The default Kubernetes scheduler never talks to a GPU. It compares integer resource
requests against integer node allocatable values. A fake node with
nvidia.com/gpu: 8 exercises the identical scheduling code path as a DGX with 8
real GPUs. What it does NOT exercise: kubelet device allocation, the NVIDIA
container runtime, CUDA, NVLink topology, MIG, or DCGM. Those are
Lesson 6.
Install¶
The setup script (../scripts/install-kwok.sh) applies
the official release manifests, per https://kwok.sigs.k8s.io/docs/user/kwok-in-cluster/ :
KWOK_REPO=kubernetes-sigs/kwok
KWOK_LATEST_RELEASE=$(curl -s "https://api.github.com/repos/${KWOK_REPO}/releases/latest" | jq -r '.tag_name')
kubectl apply -f "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/kwok.yaml"
kubectl apply -f "https://github.com/${KWOK_REPO}/releases/download/${KWOK_LATEST_RELEASE}/stage-fast.yaml"
stage-fast.yaml makes pods on fake nodes transition to Running quickly, so
scheduling demos are immediate.
Fake GPU node pools¶
../scripts/create-fake-gpu-nodes.sh
generates three pools from the template in this directory:
| Pool | Nodes | GPUs/node | Product label (GFD-style) |
|---|---|---|---|
a100 |
2 | 8 | NVIDIA-A100-SXM4-80GB |
h100 |
1 | 8 | NVIDIA-H100-80GB-HBM3 |
l40s |
2 | 4 | NVIDIA-L40S |
Total simulated fleet: 5 nodes, 32 "GPUs".
Each fake node carries:
kwok.x-k8s.io/node: fakeannotation (managed by KWOK)- Taint
kwok.x-k8s.io/node=fake:NoSchedule- workloads must tolerate it, which doubles as a safety net so nothing accidental lands on fake nodes run.ai/simulated-gpu-node-pool: <pool>- the label the fake-gpu-operator keys off to advertise that pool's GPUs onto the nodegpu-poolandnvidia.com/gpu.productlabels for pool targeting/display, matching what GPU Feature Discovery sets on real clusters
Note the node script no longer hand-writes nvidia.com/gpu into status.allocatable
- the operator does that, so the advertisement is operator-shaped (a device plugin,
like production). See fake-gpu-node-template.yaml
for the annotated node template.
Where the GPU count comes from¶
The nvidia.com/gpu integer is advertised by the fake-gpu-operator from its per-pool
topology, not hand-written and not discovered from a driver (there is none). The
nvidia.com/gpu.product label is a display/targeting convenience whose name matches
real GFD output, so workload manifests written here work unchanged on real clusters.
✅ Checkpoint: name the single field the scheduler uses for GPU placement (an
integer under status.allocatable, now advertised by the operator) and say which
component puts it there. If you can, you understand why this simulation is both
legitimate and limited.
➡️ Back to: Lesson 1, Step 1.