Local Simulation Validation Report - Lesson 1 (Kubernetes GPU Scheduling)¶
Your lab notebook entry for Lesson 1. Captured after
make phase1-up,make phase1-demo,make phase1-evidence.
Environment¶
| Item | Value |
|---|---|
| Date | 2026-06-14 |
| Host OS | Linux (WSL2) |
| kind version | v0.32.0 |
| Kubernetes version | v1.36.1 |
| KWOK release | kwok-v0.7.0 |
| GPU layer | run.ai fake-gpu-operator (advertises nvidia.com/gpu, emits synthetic DCGM) |
| Evidence directory | evidence/k8s-20260614-153951/ |
Simulated fleet¶
5 fake (KWOK) nodes, 32 fake GPUs advertised by the fake-gpu-operator, confirmed in
gpu-allocatable.txt:
| Node | Pool | GPUs |
|---|---|---|
| kwok-gpu-a100-0 | a100 | 8 |
| kwok-gpu-a100-1 | a100 | 8 |
| kwok-gpu-h100-0 | h100 | 8 |
| kwok-gpu-l40s-0 | l40s | 4 |
| kwok-gpu-l40s-1 | l40s | 4 |
Total: 32 GPUs. The two real kind nodes (control-plane, worker) advertise no GPUs.
Scenario results¶
| # | Scenario | Expected | Observed | Evidence file |
|---|---|---|---|---|
| 1 | cuda-batch-small (1 GPU, a100 pool) |
Running on a100 node | Running on kwok-gpu-a100-1 |
pods-gpu-demo.txt |
| 2 | cuda-train-16gpu (16 GPU) |
Pending: Insufficient nvidia.com/gpu | Pending - 0/7 nodes available: ... 6 Insufficient nvidia.com/gpu (1 node untolerated taint) |
describe-pod-cuda-train-16gpu.txt |
| 3 | cuda-needs-b200 (selector mismatch) |
Pending: no matching node | Pending - 0/7 nodes available: ... 6 node(s) didn't match Pod's node affinity/selector |
describe-pod-cuda-needs-b200.txt |
| 4 | queue-pressure (40 replicas vs 32 GPUs) |
~32 Running, ~8 Pending | 31 Running, 9 Pending (the 32nd GPU is held by cuda-batch-small) |
pods-gpu-demo.txt, pending-pods.txt |
Namespace totals: 32 Running, 11 Pending in gpu-demo (9 queue-pressure +
cuda-train-16gpu + cuda-needs-b200). The fleet's 32 GPUs are the binding
constraint, exactly as designed.
The two Pending root causes are distinct and diagnosable from Events alone:
Insufficient nvidia.com/gpu (asked for more than any node has) vs
didn't match Pod's node affinity/selector (asked for a pool that does not exist).
What this run proves¶
Control-plane GPU scheduling with the default scheduler: placement across heterogeneous pools, capacity-mismatch and selector-mismatch Pending diagnosis, contention behaviour under queue pressure, operator-shaped GPU advertisement (a device plugin, not a hand-written integer), and a synthetic DCGM metrics stream (the Lesson 3 bridge).
What this run does NOT prove¶
No CUDA execution, no driver/runtime path, no NCCL/NVLink/MIG/GPUDirect RDMA, no real
GPU memory behaviour. The DCGM metrics are fabricated by the operator (dashboard/alert
design, not real telemetry), and pods on KWOK nodes are simulated (no real container).
See fake-vs-real-limitations.md.
Notes / surprises¶
- Right after
phase1-up, theVERSIONcolumn for the KWOK nodes fills in one node at a time (some blank, thenkwok-v0.7.0); the kwok-controller stampskubeletVersionasynchronously. It converges within seconds and does not affect scheduling. cuda-batch-smalllanded onkwok-gpu-a100-1(either a100 node is valid; placement is up to the scheduler's scoring).