Skip to content

HAMi GPU Isolation Validation - Lesson 1C Part B / Lesson 6 Part B

STATUS: ✅ VALIDATED - captured on a real NVIDIA RTX A6000 (Hyperstack), 2026-06-23. Evidence: real command output captured during the run (artifact files cited in the table below). This is the runtime-enforcement half the HAMi scheduling sim deliberately cannot prove. Lab: hami-isolation-realgpu/.

Environment

Item Value
Date 2026-06-23
Machine Hyperstack GPU VM (optimistic-fermi), Ubuntu 22.04
GPU NVIDIA RTX A6000, 48 GB (49140 MiB), UUID GPU-d3d0a942-4b02-e4bc-b07e-485c8d2c8552; Ampere, no MIG - software sharing is the only option (the HAMi premise)
Driver 535.183.06 (CUDA 12.4)
Kubernetes k3s, containerd config v3, default-runtime: nvidia (k3s --default-runtime flag, not a containerd template)
GPU stack HAMi 2.9.0, scheduler image tag matched to the k8s server version. No GPU Operator (HAMi ships its own device plugin; the two must not coexist)
HAMi pods hami-device-plugin 2/2 Running, hami-scheduler 2/2 Running (hami-pods.txt)
Registration node allocatable nvidia.com/gpu = 10 (1 card × deviceSplitCount); hami.io/node-nvidia-registerdevmem:49140, devcore:100, type:"NVIDIA RTX A6000", mode:"hami-core" (node-allocatable.txt)

HAMi advertises only nvidia.com/gpu in node allocatable; gpumem/gpucores are accounted by the HAMi scheduler from the register annotation and enforced per-pod by HAMi-core - they are intentionally not node-allocatable resources.

Validation checklist (5 exercises)

# Exercise Pass criteria Result Evidence
1 Co-residency two pods on one physical GPU hami-share-a + hami-share-b both Running on optimistic-fermi 1-co-residency.txt
2 Virtualized device view in-pod nvidia-smi shows the slice ✅ both pods report 0MiB / 8000MiB, not the real 49140 MiB 2-3-probe-memory-a.txt, -b.txt
3 Memory-cap enforcement allocation refused at the slice, by HAMi-core cudaMalloc refused after 7680 MiB; [HAMI-core ERROR] ... Device 0 OOM 8594128896 / 8388608000 (8388608000 B = exactly 8000 MiB), while the card had ~40 GB free 2-3-probe-memory-a.txt, -b.txt
4 Per-device budget (scheduler) a pod that fits an empty card stays Pending beside the slices hami-oversubscribe (45000 MiB) Pending - FilteringFailed ... CardInsufficientMemory 4-oversubscribe-status.txt, 4-oversubscribe-events.txt
5 The mechanism the HAMi-core injection that enforces the cap ✅ env CUDA_DEVICE_MEMORY_LIMIT_0=8000m, CUDA_DEVICE_SM_LIMIT=0; library /usr/local/vgpu/libvgpu.so; NVIDIA_VISIBLE_DEVICES=GPU-d3d0a942-… 5-probe-mechanism-a.txt

What this proves (that the simulation cannot)

  • Two tenants share one physical GPU (Exercise 1) - stock Kubernetes treats a GPU as indivisible and cannot do this.
  • The slice is real, two ways: the container is shown an 8 GB card (Exercise 2), and a CUDA allocation is refused at 8 GB while ~40 GB physically remained (Exercise 3). The refusal comes from HAMi-core, not the hardware - the contradiction that only a user-space CUDA-interception cap can produce.
  • The card is one shared, accounted budget (Exercise 4): the HAMi scheduler refuses a 45 GB pod beside two 8 GB slices (CardInsufficientMemory) even though 45 GB < the 48 GB card - the real-hardware counterpart of the simulation's per-device exhaustion test.
  • The mechanism is concrete (Exercise 5): HAMi-core is injected as libvgpu.so and reads CUDA_DEVICE_MEMORY_LIMIT_0=8000m - the same CUDA_DEVICE_MEMORY_LIMIT mechanism that NVIDIA's KAI Scheduler adopted in June 2026 for its own fractional-GPU memory isolation.

Scope limits

This is software isolation (user-space CUDA interception), not MIG hardware fault isolation - treat it as a scheduling-and-accounting guarantee with runtime enforcement, not a security boundary. It proves the memory cap and the virtualized device view; it does not measure compute-throttling accuracy or noisy-neighbour interference under sustained load. Single node, so nothing about NCCL/NVLink/MIG/multi-node scale. Full ledger: fake-vs-real-limitations.md.