Network lab with QEMU
Vincent Bernat
To experiment with network stuff, I was using UML-based network labs. Many alternatives exist, like GNS3, Netkit, Marionnet or Cloonix. All of them are great viable solutions but I still prefer to stick to my minimal home-made solution with UML virtual machines. Here is why:
- I didn’t want to use disk images. They take a lot of space and they have to be maintained. They also become cluttered, especially if you try to reuse them across several labs. They are also difficult to share.
- I want to be able to access my home directory. It contains the important configuration files related to the lab and I can put them in the right place thanks to symbolic links when the lab starts. It also makes exchanging files with the lab quite easy.
- I don’t want to boot a complete system. This allows me to be cheap on memory and each virtual system should boot in a few seconds.
The use of UML had some drawbacks:
- It may be buggy. For example, it is currently not possible to use gdbserver inside UML without a patch. Sometimes, the kernel won’t even compile.
- It is slow.
However, UML features HostFS, a filesystem providing access to any part of the host filesystem. This is the killer feature which allows me to not use any virtual disk image and to get access to my home directory right from the guest.
I discovered recently that QEMU provided 9P, a similar filesystem on top of VirtIO, the paravirtualized IO framework.
Setting up the lab#
The setup of the lab is done with a single self-contained shell file. The layout is similar to what I have done with UML. I will only highlight here the most interesting steps.
Booting QEMU with a minimal kernel#
My initial goal was to experiment with
Nicolas Dichtel’s IPv6 ECMP patch. Therefore, I needed to
configure a custom kernel. I have started from make defconfig
,
removed everything that was not necessary, added what I needed for my
lab (mostly network stuff) and added the appropriate options for VirtIO
drivers:
CONFIG_NET_9P_VIRTIO=y CONFIG_VIRTIO_BLK=y CONFIG_VIRTIO_NET=y CONFIG_VIRTIO_CONSOLE=y CONFIG_HW_RANDOM_VIRTIO=y CONFIG_VIRTIO=y CONFIG_VIRTIO_RING=y CONFIG_VIRTIO_PCI=y CONFIG_VIRTIO_BALLOON=y CONFIG_VIRTIO_MMIO=y
No modules. Grab the complete configuration if you want to have a look.
From here, you can start your kernel with the following command
($LINUX
is the appropriate bzImage
):
qemu-system-x86_64 \ -m 256m \ -display none \ -nodefconfig -no-user-config -nodefaults \ \ -chardev stdio,id=charserial0,signal=off \ -device isa-serial,chardev=charserial0,id=serial0 \ \ -chardev socket,id=con0,path=$TMP/vm-$name-console.pipe,server,nowait \ -mon chardev=con0,mode=readline,default \ \ -kernel $LINUX \ -append "init=/bin/sh console=ttyS0"
Since there is no disk to boot from, the kernel will panic when trying
to mount the root filesystem. QEMU is configured to not display video
output (-display none
). A serial port is defined and uses stdio
as
a backend.1 The kernel is configured to use this serial port
as a console (console=ttyS0
). A VirtIO console could have been used
instead but it seems this is not possible to make it work early in the
boot process.
The QEMU monitor is setup to listen on a Unix socket. It is possible
to connect to it with socat UNIX:$TMP/vm-$name-console.pipe -
.
Initial ramdisk#
Update (2012-10)
I was initially unable to mount the host
filesystem as the root filesystem for the guest directly by the
kernel. In a comment, Josh Triplett told
me to use /dev/root
as the mount tag to solve this problem. I keep
using an initrd in this post but the lab on GitHub has been
updated to not use one.
Here is how to build a small initial ramdisk:
# Setup initrd setup_initrd() { info "Build initrd" DESTDIR=$TMP/initrd mkdir -p $DESTDIR # Setup busybox copy_exec $($WHICH busybox) /bin/busybox for applet in $(${DESTDIR}/bin/busybox --list); do ln -s busybox ${DESTDIR}/bin/${applet} done # Setup init cp $PROGNAME ${DESTDIR}/init cd "${DESTDIR}" && find . | \ cpio --quiet -R 0:0 -o -H newc | \ gzip > $TMP/initrd.gz }
The copy_exec
function is stolen from the initramfs-tools
package
in Debian. It will ensure that the appropriate libraries are also
copied. Another solution would have been to use a static busybox
.
The setup script is copied as /init
in the initial ramdisk. It will
detect it has been invoked as such. If it was omitted, a shell would
be spawned instead. Remove the cp
call if you want to experiment
manually.
The flag -initrd
allows QEMU to use this initial ramdisk.
Root filesystem#
Let’s mount our root filesystem using 9P. This is quite easy. First QEMU needs to be configured to export the host filesystem to the guest:
qemu-system-x86_64 \ ${PREVIOUS_ARGS} \ -fsdev local,security_model=passthrough,id=fsdev-root,path=${ROOT},readonly \ -device virtio-9p-pci,id=fs-root,fsdev=fsdev-root,mount_tag=rootshare
${ROOT}
can either be /
or any directory containing a complete
filesystem. Mounting it from the guest is quite easy:
mkdir -p /target/ro mount -t 9p rootshare /target/ro -o trans=virtio,version=9p2000.u
You should find a complete root filesystem inside /target/ro
. I have
used version=9p2000.u
instead of version=9p2000.L
because the
latter does not allow a program to mount()
a host mount
point.2
Now, you have a read-only root filesystem (because you don’t want to mess with your existing root filesystem and moreover, you did not run this lab as root, did you?). Let’s use a union filesystem. Debian comes with AUFS while Ubuntu and OpenWRT have migrated to overlayfs. I was previously using AUFS but got errors on some specific cases. It is still not clear which one will end up in the kernel. So, let’s try overlayfs.
I didn’t find any patchset ready to be applied on top of my kernel tree. I was working with David Miller’s net-next tree. Here is how I have applied the overlayfs patch on top of it:
$ git remote add torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git $ git fetch torvalds $ git remote add overlayfs git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git $ git fetch overlayfs $ git merge-base overlayfs.v15 v3.6 4cbe5a555fa58a79b6ecbb6c531b8bab0650778d $ git checkout -b net-next+overlayfs $ git cherry-pick 4cbe5a555fa58a79b6ecbb6c531b8bab0650778d..overlayfs.v15
Don’t forget to enable CONFIG_OVERLAYFS_FS
in .config
. Here is how
I configured the whole root filesystem:
info "Setup overlayfs" mkdir /target mkdir /target/ro mkdir /target/rw mkdir /target/overlay # Version 9p2000.u allows one to access /dev, /sys and mount new # partitions over them. This is not the case for 9p2000.L. mount -t 9p rootshare /target/ro -o trans=virtio,version=9p2000.u mount -t tmpfs tmpfs /target/rw -o rw mount -t overlayfs overlayfs /target/overlay -o lowerdir=/target/ro,upperdir=/target/rw mount -n -t proc proc /target/overlay/proc mount -n -t sysfs sys /target/overlay/sys info "Mount home directory on /root" mount -t 9p homeshare /target/overlay/root -o trans=virtio,version=9p2000.L,access=0,rw info "Mount lab directory on /lab" mkdir /target/overlay/lab mount -t 9p labshare /target/overlay/lab -o trans=virtio,version=9p2000.L,access=0,rw info "Chroot" export STATE=1 cp "$PROGNAME" /target/overlay exec chroot /target/overlay "$PROGNAME"
You have to export your ${HOME}
and the lab directory from host:
qemu-system-x86_64 \ ${PREVIOUS_ARGS} \ -fsdev local,security_model=passthrough,id=fsdev-root,path=${ROOT},readonly \ -device virtio-9p-pci,id=fs-root,fsdev=fsdev-root,mount_tag=rootshare \ -fsdev local,security_model=none,id=fsdev-home,path=${HOME} \ -device virtio-9p-pci,id=fs-home,fsdev=fsdev-home,mount_tag=homeshare \ -fsdev local,security_model=none,id=fsdev-lab,path=$(dirname "$PROGNAME") \ -device virtio-9p-pci,id=fs-lab,fsdev=fsdev-lab,mount_tag=labshare
Network#
You know what is missing from our network lab? Network setup. For each LAN that I will need, I spawn a VDE switch:
# Setup a VDE switch setup_switch() { info "Setup switch $1" screen -t "sw-$1" \ start-stop-daemon --make-pidfile --pidfile "$TMP/switch-$1.pid" \ --start --startas $($WHICH vde_switch) -- \ --sock "$TMP/switch-$1.sock" screen -X select 0 }
To attach an interface to the newly created LAN, I use:
mac=$(echo $name-$net | sha1sum | \ awk '{print "52:54:" substr($1,0,2) ":" substr($1, 2, 2) ":" substr($1, 4, 2) ":" substr($1, 6, 2)}') qemu-system-x86_64 \ ${PREVIOUS_ARGS} \ -net nic,model=virtio,macaddr=$mac,vlan=$net \ -net vde,sock=$TMP/switch-$net.sock,vlan=$net
The use of a VDE switch allows me to run the lab as a non-root
user. It is possible to give Internet access to each VM, either by
using -net user
flag or using slirpvde
on a special switch. I
prefer the latter solution since it will allow the VM to speak to each
others.
Debugging#
This lab was mostly done to debug both the kernel and Quagga. Each of them can be debugged remotely.
Kernel debugging#
While the kernel features KGDB, its own debugger, compatible with GDB, it is easier to use the remote GDB server built inside QEMU.
qemu-system-x86_64 \ ${PREVIOUS_ARGS} \ -gdb unix:$TMP/vm-$name-gdb.pipe,server,nowait
To connect to the remote GDB server from the host, first locate the
vmlinux
file at the root of the source tree and run GDB on it. The
kernel has to be compiled with CONFIG_DEBUG_INFO=y
to get the
appropriate debugging symbols. Then, use socat
with the Unix socket
to attach to the remote debugger:
$ gdb vmlinux GNU gdb (GDB) 7.4.1-debian Reading symbols from /home/bernat/src/linux/vmlinux...done. (gdb) target remote | socat UNIX:$TMP/vm-r1-gdb.pipe - Remote debugging using | socat UNIX:/tmp/tmp.W36qWnrCEj/vm-r1-gdb.pipe - native_safe_halt () at /home/bernat/src/linux/arch/x86/include/asm/irqflags.h:50 50 } (gdb)
You can now set breakpoints and resume the execution of the kernel.
It is easier to debug the kernel if optimizations are not
enabled. However, it is not possible to
disable them globally. You can however disable them
for some files. For example, to debug net/ipv6/route.c
, add
CFLAGS_route.o = -O0
to net/ipv6/Makefile
, remove
net/ipv6/route.o
and type make
.
Userland debugging#
To debug a program inside QEMU, you can just use gdb
as usual. Your
$HOME
directory is available and it should be therefore
straightforward. However, if you want to perform some remote
debugging, that’s quite easy. Add a new serial port to QEMU:
qemu-system-x86_64 \ ${PREVIOUS_ARGS} \ -chardev socket,id=charserial1,path=$TMP/vm-$name-serial.pipe,server,nowait \ -device isa-serial,chardev=charserial1,id=serial1
Starts gdbserver
in the guest:
$ libtool execute gdbserver /dev/ttyS1 zebra/zebra Process /root/code/orange/quagga/build/zebra/.libs/lt-zebra created; pid = 800 Remote debugging using /dev/ttyS1
And from the host, you can attach to the remote process:
$ libtool execute gdb zebra/zebra GNU gdb (GDB) 7.4.1-debian Reading symbols from /home/bernat/code/orange/quagga/build/zebra/.libs/lt-zebra...done. (gdb) target remote | socat UNIX:/tmp/tmp.W36qWnrCEj/vm-r1-serial.pipe - Remote debugging using | socat UNIX:/tmp/tmp.W36qWnrCEj/vm-r1-serial.pipe - Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 0x00007ffff7dddaf0 in ?? () from /lib64/ld-linux-x86-64.so.2 (gdb)
Demo#
For a demo, have a look at the following video:
-
stdio
is configured such that signals are not enabled. QEMU won’t stop when receivingSIGINT
. This is important for the usage we want to have. ↩︎ -
Therefore, it is not possible to mound a fresh
/proc
on top of the existing one. I have searched a bit but didn’t find why. Any comments on this is welcome. ↩︎