Fake vs Real: The Limitations Ledger¶
Part of โ Your Lab Notebook ยท Course home: AI Factory Operations Lab. Every lesson's "๐ฌ What you proved / did NOT prove" box traces back to this table - it is the course's source of truth for accurate claims.
The credibility of this project rests on this file. It states exactly which claims each lab mode can support.
Claims the FAKE (simulation) mode supports¶
| Claim | Why it holds |
|---|---|
| Kubernetes GPU scheduling and placement behaviour | The default scheduler compares integers; fake nvidia.com/gpu exercises the identical code path |
| Pending-pod triage workflow | Events and describe output are produced by the real control plane |
| Heterogeneous fleet modelling (pools, labels, taints) | Pure API-level design |
| Queue pressure and contention dynamics | Real scheduler, real contention, fake capacity |
| Queue/quota policy behaviour (with KAI Scheduler) | Scheduling policy is control-plane logic |
| Fractional-GPU scheduling arithmetic (HAMi/KAI concepts, Lesson 1C Parts 1โ2) | Deciding whether a memory/core slice fits a device is bookkeeping over integers |
| Observability pipeline design on synthetic DCGM metrics (fake-gpu-operator) | The metric names/labels (DCGM_FI_*) and per-pod attribution are real; the values are fabricated, so dashboards/alerts prove design not telemetry |
| Slurm GRES scheduling logic (Phase 3, fake GRES) | slurmctld scheduling does not require the device to exist |
Claims ONLY the REAL mode supports¶
| Claim | Requires |
|---|---|
| CUDA containers execute on GPU | Driver + Container Toolkit + real GPU |
| Device plugin advertises real GPUs | GPU Operator on real hardware |
| Real GPU telemetry | DCGM Exporter on real hardware |
| Multi-pod sharing of one GPU with enforced memory slices (HAMi, Lesson 1C Part 3) | HAMi device plugin + HAMi-core intercepting real CUDA calls |
| Real GRES enforcement in Slurm | Slurm on a GPU machine |
| Inference SLO numbers (TTFT, tokens/sec, p95/p99) | Real GPU serving real model |
Claims NEITHER mode in this lab supports¶
State these unprompted when presenting the project:
- CUDA kernel / compute performance characteristics
- NCCL collective performance, multi-node training behaviour
- NVLink / NVSwitch topology effects
- GPUDirect RDMA or any GPU networking data path
- MIG partitioning and isolation behaviour
- GPU-sharing performance interference under sustained load (HAMi compute-throttling accuracy, noisy-neighbour effects)
- Real GPU memory pressure, fragmentation, OOM dynamics at scale
- Production-scale fleet operations (hundreds+ GPUs, multi-tenant SLAs)
Vocabulary discipline¶
Use: "control-plane simulation", "runtime validation", "architecture notes", "operational drill". Avoid: any phrasing that lets a reader infer production GPU fleet experience from this repo.