When do you need to scale up?

A big issue in cloud computing is knowing when you should upstart more VMs or switch to a more powerful virtual machine in order to process user requests efficiently. Monitoring system utilization is very important for detecting if VM utilization is too high to guarantee stable and high performing IT services. But how can one determine if upscaling of a VM-infrastructure is required? Part of the answer lies in trend detection algorithms. This article describes two of the most popular ones that can be applied to VM-infrastructures.

Autocorrelations and moving averages

If a series of measurements is correlated to the time of measurement, it is said that the series is “autocorrelated”. If you measure VM utilization several times you might discover that utilization will increase or decrease from time to time. A (linear) regression of measurement values will reveal growth trends. If such a trend appears, the average utilization is increasing, it is a “moving average”. The movement of the average causes the regression to produce errors, because regression models are computed on constant average values. Therefore one has to consider the errors produced by the moving average of measured values.

Moving average and autocorrelation can be combined in the “AutoRegressive Integrated Moving Average” (ARIMA) model. The ARIMA model has two advantages: on one side the autocorrelation function of a set of values is computed, on the other side the errors that are produced by performing this calculation are minimized. ARIMA integrates aspects of autocorrelation and moving average. Therefore it is a quite feasible model to predict trends.

When the ARIMA is applied to VM utilization one can predict (with a certain probability) that some threshold of utilization will be reached in the future. Defining acceptance criteria for probabilities of growth trends and  for reaching a threshold in the future is a major steps towards determine the “ideal” point in time when an upscaling of a VM-infrastructure is required.

Two things must be done:

  1. Define threshold values for VM utilization metrics that tell when a VM is overutilized. One could e. g. say that if mean CPU-utilization of the last 5 minutes is over 90%, the VM with that CPU is inacceptably overutilized and therefore such a value is athreshold for VM utilization.
  2. Define a threshold for ARIMA growth trends that result in VM overutilization (which is the threshold for VM utilization). For this purpose you have to measure values for VM utilization metrics and repeatedly calculate growth trends following the ARIMA model. If such a calculation results in reaching a threshold for VM utilization, an upscaling of VM utilization is required.

With threshold values for VM utilization metrics and ARIMA growth trends one can construct an event management system that catches problems of VM overutilization by repeatedly measuring  metrics and calculating growth trends.

The advantages of the ARIMA model are:

  • It gives an extrapolated estimation of the future growth trend and tries to assign a value to predicted VM utilization.
  • It takes the fact that average VM utilization changes over time into account by repeatedly calculating a moving average.

The drawbacks of the ARIMA model are:

  • The model states a prediction which appears to be “exact” to the statistically inexperienced viewer, but in fact there is only a probability that the future values will be most likely in the neighbourhood of the predicted values. Given any ARIMA prediction it is still possible that growth trends will discontinue in the future. Therefore predicted values can never be seen as “guaranteed” future values.

Control charts

Another model which can be used to predict upscaling utilizes Shewhart control charts. These charts are used in business process management for controlling process quality based on statistical measurements. The idea behind control charts is the following: we have to take n repeated samples of i measurements and then calculate the range and the average of the each sample. The ranges are then put as data points in an “R-chart” and the averages are filled in an “X-chart”. Then we calculate the average μ and the standard deviation σ of all n data points in the R- and the X-chart. Then we do the following: we define some upper and lower bound for the data points which are considered as “natural” process limits and check if there are data points lying above or below these “control limits”. The upper and lower control limit (UCL and LCL) are proportional to the standard error which is σ divided by the square root of n. As a rule of thumb the UCL is defined as the average of all data points plus two times the standard error, while the LCL is the average minus two times the standard error. By calculating the UCL and LCL for the X- and R-chart, we can check if there are data points below or above the UCL.

Control charts assume that if all data points lie within the UCL and LCL limits, the process will most likely continue as it is. It is said then that the process is “in control”. The interesting thing about control charts is that data points which lie outside the UCL or LCL can be used as indicators of process changes. If multiple points lie above the UCL, a growth trend can be indicated.

When control charts are applied to VM utilization one must first define the sample size i and the number of data points n. Let us say that we want to measure average CPU utilization of the last 5 minutes. One could e. g. measure CPU utilization at 20 random points (i=20) in the time interval between 0 and 5 minutes.  Then one can calculate the average of the sample as well as the range which is the difference between the maximum and minimum of the 20 values. As a result we get one data point for the X-chart and one for the R-chart. Then one should take n samples to populate the X- and R-charts. If we chose n=5, we can then compute the standard deviation, standard error and average of all samples. This values can be used to define the UCL and LCL for the process. As a next step we must define a decision criterion for when do we say that a process will result in a growth or decline trend. We could e. g. say that if 2 or more points lie above the UCL, a growth trend will occur in the future.

The upscaling is necessary, when either a process contains 2 or more data points above the UCL and the average is near some critical threshold (where the low performance VM reaches its maximum capacity) or when a process is in control but the UCL lies above the critical threshold. In both cases an upscaling is necessary, either because the next data points will probably lie above the threshold as a result of some growth trend or because the future data points can reach the threshold even when the process continues as it is.

Control charts are a quite simple means to predict process changes. They have the following advantages:

  • Control charts are a relatively reliable indicator for future growth trends and can therefore indicate possibly critical growth trends very early.
  • They do not bias viewers towards giving “exact” predictions for future VM utilization values.

Despite these advantages, control charts also have the following drawbacks:

  • They need a lot of parameter estimations (e. g. choice of n or i). If those parameters are not chosen well, control charts lead to many “false alarms” that indicate overutilization when there is none.
  • Control charts can predict growth trends, but they do not tell anything about the strength of the growth trend. Therefore they tend to either overestimate small process changes or underestimate large changes. They are insensitive to trend sizes.

Both models, the ARIMA and the control charts have some advantages and some drawbacks. Like many tools they are just as good as the person that uses them. Often it is advisable to test both tools first and then decide which instrument should be used for VM utilization prediction. But predicting future growth trends is still more an art than a craft. Therefore it can not be decided which method is “better”, but it is clear that both of them are better than do nothing about VM performance measurements.

 

ZuFi: The Zurich Future Internet node

Motivation

Besides the work in national projects where we engage with and transfer our knowledge to local SMEs, a major focus always lay on international or more precisely european projects where we’re currently involved in a number of FP7 Future Internet projects. Our goal is to push the progress of this so called Future Internet (FI) even further by providing a platform for/to ourselves and our communities where newly developed FI services can be offered to third parties as well as the general public. To this end, we are building ZuFi – the Zurich Future Internet node.

Infrastructure

At the moment the ICCLab has set up cloud-infrastructures at two locations, one at ZHAW in Winterthur and one at the Equinix data centre in Zurich (big shout out to Equinix for providing the space and power for this!). Both of them currently run OpenStack Grizzly. The setup in Zurich emerged out of our strategic partnership with Equinix and enables further research in areas like cloud federation or cloud interoperability. It’s resources will be exclusively available for XIFI.

ZuFi – Winterthur

The following hardware is available in the Winterthur node.

Component

Description

Servers

15 x Lynx CALLEO 1240

Total Capacity

  • CPU: 240 Cores

  • RAM: 960 GB

  • HDD: 60 TB

  • 12 TB NFS shared disk space

Per server capacity

  • CPU : 2 x Intel Xeon E5620 (16 Cores)

  • RAM : 64 GB

  • HDD: 4 TB

  • Network: 3 x Gbit Ethernet NIC

Per core capacity

  • RAM: 4 GB

  • HDD: 250 GB

Switch

HP E2910al-48G

And the following software and virtualization supports are installed.

Component

Description

Hypervisor

KVM

Cloud Manager

OpenStack Grizzly

Base OS

Ubuntu 13.04

ZuFi – Equinix

The following hardware is available in the Equinix node.

Component

Description

Servers

8 x Intel Xeon 5140, 2.3 GHz

Total Capacity

  • CPU: 64 Cores

  • RAM: 256 GB

  • HDD: 10 TB SAN

Per server capacity

  • CPU : 2 x Intel Xeon 5140 (8 Cores)

  • RAM : 32 GB

  • HDD: variable (attached via SAN-Controller)

  • Network: 4 x Gbit Ethernet NIC

Per core capacity

  • RAM: 4 GB

  • HDD: variable

Switch

Cisco Catalyst 3560G

And the following software and virtualization supports are installed.

Component

Description

Hypervisor

KVM

Cloud Manager

OpenStack Grizzly

Base OS

CentOS 6.4

Future Plans

For the future it is planned to extend the current installation with several generic enablers (GEs) and with that offer the future internet services that were developed in the FI-WARE project to our academic community as well as the general public. Part of that will also be the integration with local FI as well as smart cities activities.

Webcast: ITU Telecom World Forum 2013, Panel Session “Mobile Cloud Networks”, Thursday 21 November (16.15-17:45, Bangkok)

The panel sessions will be webcast (audio and video). Questions submitted via a Twitter feed using the hashtag #ITUWORLDLIVE  or by SMS or through the ITU Telecom webcast portal will be displayed on the Moderator’s laptop screen during the session.Screenshot from 2013-11-17 01:49:04

Mobile Cloud Networks
Thursday, 21 November 2013, 16:15 – 17:45, Jupiter 9

Innovative services and products over the next decade will be strongly driven by cloud computing technologies. Research communities on cloud technologies will need to address challenges such as radio access in the cloud, new opportunities for sharing of infrastructure, open source, SDN (software defined networks), new CDN (content delivery networks), and ICN (information centric networks). Globally, green requirements, performance and scalability studies and related impacts on policy, regulation and standardisation will also need to be addressed. Telecommunication networks need to be prepared for the requirements coming from cloud services, transporting the corresponding information in an effective and efficient way. The cloud concept is being brought into network architectures, by introducing virtualisation into all signal processing and information storage in the networks, and the service provision concept as a replacement for current network node functionalities. Game developers, network operators, OTT content providers and community operators will have a big role to play in these new paradigms. A broad view will be taken, addressing perspectives of innovation, standardisation, business models, implementation, roadmap, and so on.

Moderator

  • Dr Thomas Michael Bohnert,  Zurich University of Applied Sciences, Switzerland

Panellists

  • Prof. Luis M. Correia, Associate Professor, Instituto Superior Técnico – Technical University of Lisbon, Portugal
  • Dr Neil Davies, Founder and Chief Scientist, Predictable Network Solutions, United Kingdom
  • Mr Latif Ladid, Founder & President, IPv6 Forum, Luxembourg
  • Mr Peter Riedel, Executive Vice-President, Rohde & Schwarz, Germany
  • Dr Masao Sakauchi, President, NICT, Japan

OpenStack Grizzly Multi-Node Installation with Stackforge Puppet-Modules on CentOS 6.4

This blog post describes the installation of OpenStack Grizzly with help of the Stackforge Puppet-Modules on CentOS 6.4 – with the use of network namespaces. The setup consists of a controller/network and a compute node. Of course additional compute nodes can later be added as needed. Continue reading

SDN Group Switzerland

sdn_logo_bg_whiteThe ICCLab believes in the future for Software Defined Networking for Data centers as well as for using SDN inside a common cloud cluster. We will be runnging regular meetings with the goal of exchanging knowledge and ideas with others. For more information you can join the LinkedIn Group or contact one of the chairs directly: Irena Trajkovska or Kurt Baumann. At the moment we do not follow specific topics related to SDN but we try to lead by the following mission:

  • Independent Consortium

  • Interests in Network Architecture and network application development following the SDN paradigm and related technologies.

  • Exchanging new ideas, meeting regular periods as well as collaboration for projects

5th Swiss OpenStack User Group Meetup – at University of Zurich

5th Swiss OpenStack User Group

This fifth edition of the OpenStack CH User Group has been dedicated to the networking aspects of OpenStack but not only.
More then 40 people attended the meeting on 24-Oct. from 18.00 to 21.00 at the University of Zurich, Irchel campus
Sergio Maffioletti (GC3 project director,  University of Zurich) gave a short welcome before the presentations started.

The agenda included four talks, of about 30 minutes, in the following order:

OpenStack Networking Introduction by Yves Fauser, System Engineer VMware NSBU

The talk encompassed this topics: Traditional Networking – refresher, OpenStack integrated projects big picture, Why OpenStack Networking is called Neutron now, Networking before Neutron, Nova-Networking, Drawbacks of Nova-Networking that led to Neutron, OpenStack Networking with Neutron, Neutron Overview, Available Plugins, Neutron Demo and Neutron – State of the Nation.

NFV and Swisscom’s Openstack Architecture by Markus Brunner – Swisscom

Markus Brunner gave an introduction to Network Function Virtualization and how Swisscom sees how its implementation in the service chain could help to overcome the increasing traffic vs. decreasing customer fees dilemma, by offering value added networking virtual services (firewall, IP-TV, …).  Another major aspect is to minimize the number of different hardware boxes by using virtualized components running on cloud infrastructure and reduce vendor lock-in.

Mirantis Fuel and Monitoring and how it all powers the XIFI FI PPP Project by Federico Facca – Create-Net – Italy

Federico gave a presentation on the  XIFI project, XIFI architetcure and Infrastructure TOOLBOX which has the objectives of automating the installation of host operating system, hypervisor and OpenStack software through the Preboot eXecution Environment (PXE). The TOOLBOX also defines and selects a deployment model among the ones available and discovers the servers where to install the software.
XIFI federation allows to specify a “role” (controller, storage, compute etc) for each server and makes set up & network configuration (vlan etc), supports registration of the infrastructure into the federation and finally tests the deployment so to verify that everything has been installed correctly.

Ceph Storage in OpenStack by Jens-Christian Fischer SWITCH

The presentation gave interesting hints on Ceph Design Goals, Ceph Storage options, Ceph architecture, CRUSH Algorithm, Monitor – MON and Metadata Server MDS.  Jens-Christian then concluded with information about OpenStack at SWITCH and Test Cluster.

 

OS ZU oct13_stitch

IMG_20131024_202415 IMG_20131024_202404 IMG_20131024_200646 IMG_20131024_194917 IMG_20131024_194744 IMG_20131024_194739 IMG_20131024_183818