[This post originally appeared on the XiFi blog – ICCLab@ZHAW is a partner in XiFi and is responsible for operating the Zurich node.]
As with any open compute systems, security is a serious issue which cannot be taken lightly. XiFi takes security seriously and has regular reviews of security issues which arise during node operations.
As well as being reactive to specific incidents, proper security processes require regular upgrading and patching of systems. The Venom threat which was announced in April is real for many of the systems in XiFi as the KVM hypervisor is quite widely used. Consequently, it was necessary to upgrade systems to secure them against this threat. Here we offer a few points on our experience with this quite fundamental upgrade.
The Venom vulnerability exploits a weakness in the Floppy Disk Controller in qemu. Securing systems against Venom requires upgrading to a newer version of qemu (terminating any existing qemu processes and typically restarting the host). In an operational KVM-based system, the VMs are running in qemu environments so a simple qemu upgrade without terminating existing qemu process does not remove the vulnerability; for this reason, upgrading the system with minimal user impact is a little complex.
Our basic approach to perform the upgrade involved evacuating a single host – moving all VMs on that host to other hosts in the system – and then performing the upgrade on that system. As Openstack is not a bulletproof platform as yet, we did this with caution, moving VMs one by one, ensuring that VMs were not affected by the move (by checking network connectivity for those that had public IP address and checking the console for a sample of the remainder). We used the block migration mechanism supported by Openstack – even though this can be somewhat less efficient (depending on configuration), it is more widely applicable and does not require setup of NFS shares between hosts. Overall, this part of the process was quite time-consuming.
Once all VMs had been moved from a host, it was relatively straightforward to upgrade qemu. As we had deployed our node using Mirantis Fuel, we followed the instructions provided by Mirantis to perform the upgrade. For us, there were a couple of points missing in this documentation – there were more package dependencies (not so many – about 10) which we had to install manually from the Mirantis repo. Also, for a deployment with Fuel 5.1.1 (which we had), the documentation erroneously omits an upgrade to one important process – qemu-kvm. Once we had downloaded and installed the packages manually (using dpkg), we could reboot the system and it was then secure.
In this manner, we upgraded all of our hosts and service to the users was not impacted (as far as we know)…and now we wait for the next vulnerability to be discovered!