Resource Management in Cloud Computing is a topic that has received much interest both within the research community and within the operations of the large cloud providers; naturally, as it has a significant impact on the cloud provider’s bottom line. Much of the work to date on resource management focuses on Service Level Agreements (for different definitions of an SLA); some of the work also considers energy as a factor.
The primary objective of this work is to develop an energy aware load management solution for Openstack: variants of this have been proposed before and indeed implemented in other stacks (e.g. Eucalyptus) but no such capability exists for Openstack as yet. As well as realizing the solution, the work will involve deploying a variant of the solution on the cloud platform without impacting the operation of the platform and determining what energy savings can be made. It is worth noting that the classical load balancing approach which is very typical for resource managers in cloud contexts is somewhat contradictory to minimizing energy consumption; consequently, the very standard load management tools are not suitable for minimizing cloud energy consumption.
The research challenges are the following:
- How to characterize the load in the system, particularly relating to spikes in demand
- How much buffer space to maintain to accommodate load spikes
- How to perform load consolidation – what load should be moved to what machines?
- When to perform load consolidation – how frequently should it take place?
- What are the energy gains that can be achieved from such a dynamic system?
Relevance to current and future markets
Advanced resource management mechanisms are a necessity for cloud computing generally. In the case of large deployments, Facebook’s autoscale is an example of how they can be used to achieve energy savings of the order of 15%. In the case of smaller deployments, it is still the case that there are many [[ https://gigaom.com/2013/11/30/the-sorry-state-of-server-utilization-and-the-impending-post-hypervisor-era/ | highly underutilized servers ]] in typical Data Centres and ultimately there will be a need to reduce costs and realize energy efficiencies. The problem is a large, general problem and energy is one specific aspect of it – one of the challenges for this work is how to integrate with other active parts of the ecosystem.
There are some commercial offering which explicitly address energy efficiency in the cloud context. These include:
- Eucalyptus has support for Energy Efficient Management of VMs which uses essentially the same techniques as in this work, albeit in the context of a different cloud stack;
- Hauwei offers Fusion Compute which includes energy management as one of its capabilities;
- Sardina Systems has an offering which focuses on energy management for Openstack specifically.
- Link to Code
- So far this work has focused on understanding performance of live migration – the code to perform advanced load management has not been pushed out to our public repo as it is currently in its very initial stages.
- Performance analysis of “post-copy” live migration in Openstack, Dec 2014
- Setting up post-copy live migration in OpenStack, Dec 2014
- The impact of ephemeral VM disk usage on the performance of Live Migration in Openstack, Oct 2014
- Performance of Live Migration in Openstack under CPU and network load, Sept 2014
- An analysis of the performance of live migration in Openstack, Sept 2014
- An analysis of the performance of block live migration in Openstack, Sept 2014
- Setting up Live Migration in Openstack Icehouse, Sept 2014
- Vojtech’s vbrownbag talk at Openstack Summit Paris, Nov 2014
- Link to related projects and initiatives
See the Energy Theme for the larger system architecture.
The next steps on the implementation roadmap are as follows:
- Get tunnelled post-copy live migration working with modifications to libvirt (Jan 2015)
- See if this can be pushed upstream to libvirt
- Consolidate live migration work into clearer message relating to the potential of live migration (Jan 2015)
- Devise control mechanism which can be used to provide energy based control (Feb 2015)
- Deploy and test on Arcus servers (Mar 2015)
- Determine if it is ready for deployment on Bart/Lisa (April 2015)
- Seán Murphy <firstname.lastname@example.org>