Skip to main content

Attack scenarios

This page walks through concrete attack scenarios to show what vNode's isolation guarantees mean in practice. For each scenario, we compare what happens on a standard container runtime vs inside a vNode.

Attack vector reference

Attack vectorStandard runtimevNode
Container escape to hostAttacker gains root on the host nodeAttacker lands as unprivileged UID (65536+) with no host access
Read other tenants' processesVisible through /procPID namespace isolates — only own processes visible
Read host hardware identifiersExposed through /sysFUSE returns hidden or emulated data
Modify kernel parametersOften allowed (for example, /proc/sys/)FUSE intercepts and scopes to the container
Run Docker inside a containerRequires privileged: trueWorks safely through user namespaces — no privilege escalation
Raw network sniffingRequires capabilities or AF_PACKETBlocked by seccomp — AF_PACKET and promiscuous mode denied

The escape landing zone

The most important guarantee vNode provides is what an attacker finds after a container escape.

In a standard container runtime, a process that escapes its container often runs as root on the host node. From there, it can read other tenants' files, access the container runtime socket, and potentially move laterally across the cluster.

Inside a vNode, the same escape lands differently. The workload's root process (UID 0 inside the container) maps to a high-numbered unprivileged UID on the host. After the escape:

  • No root access. The host UID is 65536 or higher, an unprivileged user with no elevated permissions.
  • No access to other tenants' files. Each vNode uses a distinct UID range. The kernel's file permission checks use the host UID, which only belongs to one vNode.
  • No visibility into other tenants' processes. PID namespaces ensure only own processes are visible in /proc.
  • No raw network access. Seccomp blocks AF_PACKET and SIOCSIFFLAGS (promiscuous mode) unconditionally.

The isolation holds at the kernel level, not at the policy level. It doesn't rely on the workload being well-behaved or on Kubernetes admission controls being correctly configured.

Case study: NVIDIAScape (CVE-2024-0132)

NVIDIAScape is a container breakout vulnerability discovered in July 2024 with a CVSS score of 9.0. It affects the NVIDIA Container Toolkit and represents a class of attack that was previously difficult to defend against without vNode-style isolation.

How the attack works

The NVIDIA Container Toolkit processes container configuration and sets up GPU access before a container starts. When it does this, it runs as a privileged host process.

An attacker can craft a container image that sets LD_PRELOAD to a shared library bundled in the image. When the NVIDIA Container Toolkit forks to handle the container, it inherits the malicious LD_PRELOAD environment. The attacker's library executes in the context of a privileged host process, achieving a breakout to the host.

Without vNode

The malicious library runs as a privileged host process. The attacker has root on the host node and can read other containers' filesystems, access the kubelet socket, and pivot to other nodes.

With vNode

The container image itself runs inside a vNode sandbox. The vNode's root process maps to an unprivileged host UID (65536+). When the NVIDIA Container Toolkit forks, it inherits a process context that is already unprivileged on the host. The LD_PRELOAD injection still executes, but it runs as an unprivileged host user. The attacker gains no access to the host beyond what that unprivileged UID allows, which is nothing useful.

vNode doesn't patch the vulnerability in the NVIDIA Container Toolkit. It contains the blast radius so that exploiting the vulnerability doesn't yield host access.

note

This protection applies to the entire class of attacks that rely on inheriting a privileged process context from the container runtime or GPU toolkit. NVIDIAScape is one example. vNode's isolation model limits the impact of similar vulnerabilities that haven't been discovered yet.