We needed to deploy a simple hypervisor on one of our internal systems. This system was disconnected from the Internet for some particular reasons and it was surprisingly difficult to find a suitable bundled linux distribution which provided the features we needed. Ubuntu desktop does not come with KVM, ubuntu server does not have X, so it’s not easy to run virt-manager on the machine. There were no CentOS versions that provided the right mix of capabilities; oVirt-node was one reasonable candidate, but it only exposed a management interface which needed to be run remotely or else somehow packaged with the system. One of our colleagues is a big fan of proxmox, but this is overkill for the simple single hypervisor case we wanted.
We had to investigate the operation of one of our Openstack compute nodes as it was exhibiting some unusual behaviour. We quickly determined that there was some unexpected packet loss and we had reason to believe that this could have been due to the packet processing in the node. Investigating this problem necessitated some deeper exploration of how packets are processed in the node, particularly relating to the mix of ovs bridges, linux bridges and iptables. It turns out that this is rather complex and clear information describing how all this fits together in detail is not readily available. Here, we note what we learnt from this exploration.
Carlo Vallati was a visiting researcher during Aug/Sept 2015. (See here for a short note on his experience visiting). In this post he outlines how cloud computing needs to evolve to meet future requirements.
Despite the increasing usage of cloud computing as enabler for a wide number of applications, the next wave of technological evolution – the Internet of Things and Robotics – will require the extension of the classical centralized cloud computing architecture towards a more distributed architecture that includes computing and storage nodes installed close to users and physical systems. Edge computing will also require greater flexibility, necessary to handle the huge increase in the number of devices – a distributed architecture will guarantee scalability – and to deal with privacy concerns that are arising among end users – edge computing will limit exposure of private data. Continue reading
Us ICCLab folk are always interested in new ideas, particularly those that could have a profound impact on computing in general and cloud computing in particular. Consequently, we couldn’t miss out on the opportunity of attending ORConf – a conference loosely centred around open source silicon – which was free and (more or less) just down the road at CERN.
The conference itself was superb, comprising of an excellent mix of hobbyists/open source advocates, industry folks and academics with some of the people wearing more than one hat. There was also quite a diverse set of backgrounds ranging from ASIC designers to FPGA guys to compiler designers to some simpler software types. The quality of people was overwhelming with excellent guys from high profile organizations such as Intel, Google, Qualcomm, nvidia, Uni Cambridge, EPFL, ETH and Berkeley (although many of the industry folk were not specifically representing their employers).
Carlo Vallati visited the lab in Aug/Sept 2015 – here is a short note he wrote on his time here.
I had the pleasure of visiting ICCLab during Aug/Sept 2015 to work on a research proposal. By way of background, I am a postdoctoral researcher with Computer Networking Group (CNG) at the University of Pisa. My research interest focuses on the Internet of Things and Machine-to-Machine applications.
I visited the ICCLab for a month to write a proposal for the prestigious EU Horizon 2020 Marie Skłodowska-Curie Individual Fellowship scheme. The proposal focuses on providing cloud support for the execution of Machine-to-Machine applications – applications that rely on direct interactions with objects such as sensors and actuators – with low latency interaction requirements. In particular, the project will work on extending the current cloud computing architecture through the integration of cloud nodes close to physical devices, e.g. Fog nodes or Local Data Centers. Should the proposal be successful, it will provide funding to enable me to work on this topic with ICCLab and an industry partner (the proposal includes a secondment to a specific industry partner). Continue reading
It is well recognized that GPUs can greatly outperform standard CPUs for certain types of work – typically those which can be decomposed into many basic computations which can be parallelized; matrix operations are the classical example. However, GPUs have evolved primarily in the context of the quite independent video subsystem and even there, the key driver has been support for advanced graphics and gaming. Consequently, they have not been architected to support diverse applications within the cloud. In this blog post we comment on the state of the art regarding GPU support in the cloud.
[This post originally appeared on the XiFi blog – ICCLab@ZHAW is a partner in XiFi and is responsible for operating the Zurich node.]
As with any open compute systems, security is a serious issue which cannot be taken lightly. XiFi takes security seriously and has regular reviews of security issues which arise during node operations.
As well as being reactive to specific incidents, proper security processes require regular upgrading and patching of systems. The Venom threat which was announced in April is real for many of the systems in XiFi as the KVM hypervisor is quite widely used. Consequently, it was necessary to upgrade systems to secure them against this threat. Here we offer a few points on our experience with this quite fundamental upgrade.
The Venom vulnerability exploits a weakness in the Floppy Disk Controller in qemu. Securing systems against Venom requires upgrading to a newer version of qemu (terminating any existing qemu processes and typically restarting the host). In an operational KVM-based system, the VMs are running in qemu environments so a simple qemu upgrade without terminating existing qemu process does not remove the vulnerability; for this reason, upgrading the system with minimal user impact is a little complex.
Our basic approach to perform the upgrade involved evacuating a single host – moving all VMs on that host to other hosts in the system – and then performing the upgrade on that system. As Openstack is not a bulletproof platform as yet, we did this with caution, moving VMs one by one, ensuring that VMs were not affected by the move (by checking network connectivity for those that had public IP address and checking the console for a sample of the remainder). We used the block migration mechanism supported by Openstack – even though this can be somewhat less efficient (depending on configuration), it is more widely applicable and does not require setup of NFS shares between hosts. Overall, this part of the process was quite time-consuming.
Once all VMs had been moved from a host, it was relatively straightforward to upgrade qemu. As we had deployed our node using Mirantis Fuel, we followed the instructions provided by Mirantis to perform the upgrade. For us, there were a couple of points missing in this documentation – there were more package dependencies (not so many – about 10) which we had to install manually from the Mirantis repo. Also, for a deployment with Fuel 5.1.1 (which we had), the documentation erroneously omits an upgrade to one important process – qemu-kvm. Once we had downloaded and installed the packages manually (using dpkg), we could reboot the system and it was then secure.
In this manner, we upgraded all of our hosts and service to the users was not impacted (as far as we know)…and now we wait for the next vulnerability to be discovered!
[Note: This blog post was originally published on the XiFi blog here.]
One of the main jobs performed by the Infrastructures in XiFi is to manage quotas: the resources available are not infinite and consequently resource management is necessary. In Openstack this is done through quotas. Here we discuss how we work with them in Openstack.
Operating an Openstack cloud infrastructure is not a trivial task which requires constant oversight of the use of the cloud resources. Sophisticated monitoring is necessary to ensure that the system continues to operate properly and delivers satisfactory performance to the users. One aspect of monitoring a cloud infrastructure pertains to ensuring that the system exposes a minimal attack surface: this means ensuring that a minimum amount of the system is exposed, particularly ports on public IP addresses. We are developing a basic set of monitoring and administration tools, one of which focuses on identifying VMs that may be too exposed. Here, we provide a brief description of this tool.
Following from our successful and fun outing at the Openstack Summit in Paris (read all about our adventures here, here and here ) at which Vojtech and Srikanta gave talks on live migration and rating, charging and billing respectively (videos here), we’re getting prepped up for the next one in Vancouver.
This time out, we’ve thrown our hat into the ring twice. Victor has put something together on his adventures with Monasca and I’m hoping to talk a bit about our energy focused work.
We’ll describe our preliminary development on Watchtower which is a Cloud Incident Management solution that is built on a bunch of technologies including Monasca, Camunda, and Rundeck. It provides workflow and process automation empowered by BPMN 2.0 for handling the incidents’ lifecycle, and runbook automation in order to resolve them automatically. Vote here if you’re interested in this.
On the energy side, we’re hoping to talk about our work on our Kwapi based energy monitoring tool and some of the work we have been doing on advanced energy based control systems for openstack which leverage live migration mechanisms to ensure energy efficiency is maximized. Vote here if you’re interested in hearing about this at the summit (or if you’re interested in this generally!)
We’re really looking forward to seeing what’s new in Openstack in Hollywood North in May!