【Paper】Accelerating Nested Virtualization with HyperTurtle

VM-based container such as Kata containers are often deployed as nested VMs, but nested VMs incur high hypercall overhead due to the additional world switched required. In the paper, HyperTurtle proposed an approach to reduce the number of world switches.

Authors: Ori Ben Zur, Jakob Krebs, Shai Aviram Bergman, Mark Silberstein
Paper link: https://www.usenix.org/conference/atc25/presentation/zur
Publication: 2025 USENIX Annual Technical Conference

HyperTurtle¶

Why is nested virtualization slow?¶

To know to root cause of the overhead, we have to learn how nested virtualization communication works.

Nested Virtualization

According to the figure from the paper. Let L0 be the bare-metal machine, L1 be the VM and L2 be the nested VM in L1. Suppose we have only L0 and L1 now, and L1 wants to create L2:

L1: Hey, L0! I want to create a nested virtual machine, but it needs your help to forward the hypercall to me. Can you help me?

L0: Sure!

This is the vanilla case (a), where L0 acts as the bridge between L1 and L2.

L1: Hey, L0! The communication between L2 and me is much slower than that between us. What happened?

L0: It's no surprise. Your communication involves 4 world switches and our's involves only 2 world switches.

When the CPU switches from one virtualization layer to another, a world switch happens. You can think this as a slower context switch, no wonder the communication between L1 and L2 so slow.

How to Make It Faster¶

L1: In that case, can you help me handle L2's hypercalls? So there won't be additional world switches.

L0: Sounds good, but I don't know how to handle them.

L1: Do you know the cool thing called eBPF? You run my eBPF program for me and share the eBPF maps with me. All the logics are in the eBPF program, and you only have to run it!

L0: Deal!

This is the core concept of HyperTurtle. It offloads a subset of the L1 to L0 with eBPF programs, reducing the number of world switches. The authors profiled the latency between L1 and L2 (with HyperTurtle) and found it very close to that between L0 and L1, showing its effectiveness on reducing the latency.

Ideas¶

Why Can't L2 Exit Directly to L1?¶

Because CPU is not designed for nested virtualization, and both L1 and L2 runs in the same privilege level. So L2 must exit to L0 and then L0 forwards the exit to L1.

Why eBPF instead of C Programs?¶

This is because an eBPF program is a subset (no loop, etc.) of a C program and thus easier to verify its safety.

Why are Kata Containers Often Deployed as Nested VMs?¶

This question is crucial, because if we don't deploy them as nested VMs, then the overhead of nested VMs is not a problem.

If Kata containers directly run on Host machine, then what a customer gets is not a full OS, only a container. So generally the cloud service provides a VM to a customer, and they run Kata containers on top of it. This is why Kata containers are often deployed as nested VMs, especially in the cloud.

Leaky Abstraction¶

HyperTurtle allows L0 to access the data of L2, which, in my opinion, is leaky abstraction. I'm not saying this is bad, but I think a virtual machine should not know anything about it's hypervisor, vice versa. And I'm worried about the possibility that the leaky abstraction may bring extra security risks, which are VMs tries to avoid.