eBPF: fast bytecode for the kernel

Why?

Nowadays, Linux is deployed in many component of the network fabric. Your home NAT probably runs it, your datacenter’s router also. As it needs to handle incoming packet on gigabits links, the speed at which the system handles the packets becomes important. If you run your router in user-space, the kernel needs to read every packet from the network card, switch to user-space, let your program process it, then go back to kernel-space to forward it. This double transition is very costly, even more so since mitigations for speculative execution are flushing many caches. How to avoid that?

How?

This is the role of the Berkley Packet Filter (BPF). It allows to attach to any socket a filter that runs in kernel-space. This allows to reduce the number of packets which cross the kernel-user boundary. As running some random assembly in kernel-space is unsafe, this filter is written in a BPF bytecode. This bytecode is quite simple: 32bits, a few registers, ALU operations, forward jumps, some way to read packets, … This greatly increase the speed of network operations. But machines are evolving: 64bits, many registers. There is space for improvement. Also, running user provided code in the kernel is a powerful idea.

An improved version of the BPF is called the extended BPF (eBPF). Its bytecode is closer to widely-used assembly so JIT is both faster and more efficient. Now you can write some C and compile it to eBPF via GCC/LLVM. You can also attach theses programs to schedulers, disk IO, syscalls, … It adds a great deal of flexibility to Linux.

Then?

As more and more tools become available in the eBPF-ecosystem, more and more project uses it. To list a few, tcpdump, katran, DPDK, …
If you want to learn more, you can find some good pointers on ebpf.io.

For a closing note, I’d like to point out that a microkernel design would also have solved these issues. But that’s for another article 🙂