Dependability Modeling on OpenStack: Part 2

In the previous article we defined use cases for an OpenStack implementation according to the usage scenario in which the OpenStack environment is deployed. In this part of the Dependability Modeling article series we will show how these use cases relate to functions and services provided by the OpenStack environment and create a set of dependabilities between use cases, functions, services and system components. From this set we will draw the dependency graph and make the impact of component outages computable.

Construct dependency table

The dependency graph can be constructed if we define which functions, services and components allow provision of a use case. In the example below (Fig. 1) we defined the system architecture components, services and functions which allow to create, delete or update details of a Telco Account (account of mobile end user). Since these operations are provided within virtual machines, VM User Management and VM Security Management functions provide availability of this use case. Therefore we draw a column which contains these functions. Because these functions need a User Management, SSH & Password Management service in each VM in order to operate, we draw a second column which contains the required services. Another column is constructed which tells the system components required in order to deliver the required services.

Fig. 1: Dependency Graph Construction.

Fig. 1: Dependency Graph Construction.

The procedure mentioned above is repeated for all use cases. As a result you get a table like the one in (Tab. 1). This dependency table is the starting point for the production of the dependency graph.

Tab. 1: Dependencies between Use Cases, Services, Functions and Components.

Tab. 1: Dependencies between Use Cases, Services, Functions and Components.

Construct dependency graph

For each component that is listed in the table you have to model the corresponding services, functions and use cases. This is performed like in the example in (Fig. 2). We start from the right of the graph with the Ceilometer component and the VM plugin and look which services are provided by those components: it is e. g. the “Ceilometer Monitoring” service. Therefore we draw an icon that represents this service and draw arrows from the Ceilometer and VM plugin components to the service icon (1). In the next step we look which function is provided by the Ceilometer Monitoring service. This is the “Monitoring of VM” function. Therefore we paste an icon for the function and draw an arrow to this function (2). Then we look for the use cases provided by the Monitoring of VM function. Since this is e. g. “Measure SLAs”, we paste an icon for this use case and draw another arrow to “Measure SLAs” (3). The first path between an use case and components on which it depends is drawn. This procedure is repeated on all components in (Tab. 1).

Fig. 2: Dependency Graph Construction from Dependency Table.

Fig. 2: Dependency Graph Construction from Dependency Table.

The result is the dependency graph shown below (Fig. 3).

Fig. 3: Dependency Graph of OpenStack Environment.

Fig. 3: Dependency Graph of OpenStack Environment.

Add weight factors to use cases

Once the dependency graph is constructed, we can calculate the “impact” of component outages. When a component fails, you can simply follow the arrows in the dependency graph to see which user interactions (use cases) stop to be available for end users. If e. g. the Ceilometer component fails, you would not be able to measure SLAs, meter usage of Telco services or monitor the VM infrastructure.

But it would not be a very sophisticated practice to say that each use case is equally important to the end user. Some user interactions like e. g. creation of new VM nodes need not be available all the time (or at least it depends on the OLAs of the Telco). Other actions like e. g. Telco authentication must be available all the time. Therefore, we have to add weight factors to use cases. This can be done by adding another column to the dependency table and name it “Weight factor”. The weight factor should be a score measuring the “importance” of an user interaction in terms of business need. In a productive OpenStack environment, financial values (which correspond to the business value of the user interaction) could be assigned as weight factors to each use case. For reasons of simplicity we take the ordinal values 1, 2 and 3 as weight factors (whereby 1 signifies the least important user transaction and 3 the most important user transaction). For each use case row in the dependency table we add the corresponding weight factor (Fig. 4).

Fig. 4: Assignment of weight factors.

Fig. 4: Assignment of weight factors.

As a next step, we create a pivot table containing the components and use cases as consecutive row fields and the weight factors as data field. In order to avoid duplicate counts (of use cases) we use the maximum function instead of the sum function. As a result we get the pivot table in (Tab. 2).

Tab. 2: Pivot Table of Component/Use Case dependencies.

Tab. 2: Pivot Table of Component/Use Case dependencies.

Calculate outage impacts

Calculation of system component outages is now quite straightforward. Just look at the pivot table and calculate the pivot sum of the weight factors of each component. As a result we have a table of failure impact sizes (Tab.3).

Tab. 3: OpenStack Components and Failure Impact Sizes.

Tab. 3: OpenStack Components and Failure Impact Sizes.

This table reveals which components are very important for the overall reliability of the OpenStack environment and which are not. It is an operationalization of the measurement of “failure impact” for a given IT environment (failure impacts can be measured as number). The advantage of this approach is that we can build a test framework for OpenStack availability based on the failure impact sizes.

Most obviously components whith strong support functionality like e. g. MySQL or the Keystone component have high failure impact sizes and should be strongly protected against outages. VM internal components seem to be not so important because VMs can be easily cloned and recovered in a cloud environment.

In a further article we will show how availability can be tested with the given failure impact size values on a given OpenStack architecture.

 

3rd Swiss OpenStack User Group Meetup


chosug
Following on from our 2nd meeting, the Swiss OpenStack user group met on 24th of April at the University of Bern.It was an excellent event with many attention grabbing presentations! A big thanks goes out to the sponsors:

 

 

Once we kicked off
, there were five presentations, 3 which were more detailed and 2 that were more lightning talks in nature. The presentations in there running order were:

Upcoming

There are other upcoming Swiss events that will include much talk of OpenStack. Of note are:

Also the Swiss Informatics Society have started a cloud computing special interest group, where all folk active in cloud are welcomed to join. More details can be found at their site.

Swiss OpenStack User Group channels

 

Dependability Modeling on OpenStack: Part 1

Dependability Modeling is carried out in 4 steps: model the user intercations, model the system functions, model the system services and then model the system components which make system services available. In the first part we will define which interactions could be expected from end users of the OpenStack cloud platform and construct the first part of the dependability graph. Once the dependapility model is constructed, a Dependability Analysis will be performed and several OpenStack HA architectures will be rated according to their outage risk.

Before we can define use cases for an OpenStack HA environment, we must first think about its Deployment Model. According to the Use Cases Whitepaper of the Open Cloud Manifesto, every cloud has its own use case scenario which depends on its “Cloud Deployment Model”. A Cloud Deployment Model is a method which describes the way how the cloud is deployed in an organizational context. The US National Institute of Standards and Technology (NIST) has published a definition paper which describes essential characteristics of cloud computing as well as possible types of Service and Deployment Models for cloud environments. According to the NIST definition of Cloud Computing, there are four types of Cloud Deployment Models:

  • Private Cloud: The cloud infrastructure is operated for one single organization inside that organization’s firewall. All data and processes are managed within the organization and are therefore not exposed to security issues, network bandwidth limitations or legal restrictions (in contrast to a Public Cloud).
  • Community Cloud: The cloud infrastructure is shared by several organizations and has the purpose to support a specific community of end users who have shared concerns. Typical Community Clouds are e. g. Googledocs, Facebook, Dropbox.
  • Public Cloud: The cloud infrastructure is made available to the general public and is owned by a cloud provider organization.
  • Hybrid Cloud: The cloud infrastructure is a composition of multiple other clouds (private, community or public) that remain unique entities but are bound together by technology that enables interoperability.

According to this definition, the MobileCloud Networking (MCN) infrastructure is rather a Hybrid Cloud. On one hand MCN is used as a Private Cloud for the Telcos to manage their infrastructure environment and handle peak loads or infrastructure-based network issues. On the other hand, the MCN is a Public Cloud for the Mobile End Users: they request communication services from the Telco sites, register and authenticate themselves and consume the communication service offered by the Telco. Mobile End Users produce the load on the Telco managed infrastructure. The MCN is deployed in an “Enterprise to Cloud to End User” scenario (Fig. 1).

Fig. 1: Enterprise to Cloud to End User

Fig. 1: Enterprise to Cloud to End User

Typically the Enterprise to Cloud to End User Scenario requires the following features:

  • Identity Management: This is performed by the authentication services provided by the Telco. Authentication services run inside the virtual machines provided by OpenStack.
  • Use of an open client: Management of the cloud should not depend on a particular platform/technology. In OpenStack this is guaranteed by using the Horizon Dashboard.
  • Federated Identity Management: Identity of Telco users should also be managed in parallel to end users. In OpenStack Telco users are managed by the Keystone component. End users are authenticated in the virtual machines provided by the Telco.
  • Location awareness: Depending on the legal restrictions in the Telco industry, data of end users must be stored on particular physical servers. Therefore the cloud service must provide awareness of the location of end users.
  • Metering and monitoring: All cloud services must be metered for chargeback and provisioning. MCN uses a provisioning facility for this task.
  • Management and Governance: It is up to the Telcos to define Governance policies for the VMs managed by OpenStack. Policies and rules can be configured via Keystone.
  • Security: The OpenStack cloud network should be secured against unauthorized access. Security is a typical Keystone task.
  • Common File Format for VMs: The infrastructure of Telco organizations might be heterogenous. For reasons of interoperability the file format of VMs used in the MCN cloud should be interchangeable. Nova is the computation component of the OpenStack framework. Nova is technology-agnostic and therefore offers VM-interoperability between many different VM-systems like e. g. KVM, Xen, Virtualbox etc.
  • Common APIs for Cloud Storage and Middleware: OpenStack offers a common API for Cloud Storage: Images are stored and managed by the Glance component. All objects managed in the cloud are stored with the Swift API. Block storage is managed by Cinder.
  • Data Application and Federation: All cloud data must be federated in order to manage the cloud infrastructure. In OpenStack cloud data is managed by a MySQL server.
  • SLAs and Benchmarks: The OpenStack environment must fulfil SLAs with the end users as well as OLAs with the Telco itself. SLAs can be metered by the MCN provisioning facility.
  • Lifecycle Management: The lifecycle of VMs must be managed also in the MCN infrastructure. Lifecycle Management is also a task of Nova component.

If we follow the list of requirements we can define use cases for the OpenStack environment of the MobileCloud Network (Tab.1). The result is a list of use cases which define the user interactions with the OpenStack cloud.

Tab. 1: Use Cases for an OpenStack environment.

Tab. 1: Use Cases for an OpenStack environment.

Modeling the user interactions is the first step in Dependability Modeling. In order to get a full Dependability Model of the OpenStack environment we must investigate the functions and services which make the user interactions available. A further post will show how this is done.

Antonio Cimmino

Researcher of the InIT Cloud Computing Lab. His research interests are related to the studies and implementations of Future Internet technologies including cloud services and infrastructures aspects for the Smart Cities. His solid background in the industry allows to have a broad vision of  market impacts and needs for the research and education. He is currently involved in the project FI PPP CONCORD and proposal preparations. With H2020 SESAME project he celebrates his 15th EU project acquisition.

After receiving is Master’s degree in electronic engineering and IT at the University of Naples (Italy), Antonio started working as trainer  at the Italian AIR force in telecommunications and air-navigation systems, then for the transport industry as HW designer and for the telecommunication industry working in the radio mobile research department, primarily involved in mobile research activities for  system definition and network architecture on Mobile Broadband System project (MBS / 60 GHz ), UMTS – Monet, OBANET and Moicane (FP5 programme). He has a long experience in marketing and product lifecycle for legacy switching,NGN, web applications and optics technologies. During last decade, he has been involved in preparation and participation coordination of  EU research projects (FP6-WEIRD, FP6-Onelab, FP6-ephoton+  and FP6-Magnet beyond) for IST area and then for FP7 (ETICS, ECONET, Outsmart,  Smart Santander and FIWARE).

Dependability Modeling: Testing Availability from an End User’s Perspective

In a former article we spoke about testing High Availability in OpenStack with the Chaos Monkey. While the Chaos Monkey is a great tool to test what happens if some system components fail, it does not reveal anything about the general strengths and weaknesses of different system architectures.  In order to determine if an architecture with 2 redundant controller nodes and 2 compute nodes offers a higher availability level than an architecture with 3 compute nodes and only 1 controller node, a framework for testing different architectures is required. The “Dependability Modeling Framework” seems to be a great opportunity to evaluate different system architectures on their ability to achieve availability levels required by end users.

Overcome biased design decisions

The Dependability Modeling Framework is a hierarchical modeling framework for dependability evaluation of system architectures. Its purpose is to model different alternative architectural solutions for one IT system and then calculate the dependability characteristics of each different IT system realization. The calculated dependability values can help IT architects to rate system architectures before they are implemented and to choose the “best” approach from different possible alternatives. Design decisions which are based on Dependability Modeling Framework have the potential to be more reflective and less biased than purely intuitive design decisions, since no particular architectural design is preferred to others. The fit of a particular solution is tested versus previously defined criteria before any decision is taken.

Build models on different levels

The Dependability Models are built on four levels: the user level, the function level, the service level and the resource level. The levels reflect the method to first identify user interactions as well as system functions and services which are provided to users and then find resources which are contributing to accomplishment of the required functions. Once all user interactions, system functions, services and resources are identified, models are built (on each of the four levels) to assess the impact of component failures on the quality of the service delivered to end users. The models are connected in a dependency graph to show the different dependencies between user interactions, system functions, services and system resources. Once all dependencies are clear, the impact of a system resource outage to user functions can be calculated straightforward: if the failing resource was the only resource which delivered functions which were critical to the end user, the impact of the resource outage is very high. If there are redundant resources, services or functions, the impact is much less severe.
The dependency graph below demonstrates how end user interactions depend on functions, services and resources.
Dependability Graph

Fig. 1: Dependency Graph

The Dependability Model makes the impact of resource outages calculable. One could easily see that a Chaos Monkey test can verify such dependability graphs, since the Chaos Monkey effectively tests outage of system resources by randomly unplugging devices.  The less obvious part of the Dependability Modelling Framework is the calculation of resource outage probabilities. The probability of an outage could only be obtained by regularly measuring unavailability of resources over a long time frame. Since there is no such data available, one must estimate the probabilities and use this estimation as a parameter to calculate the dependability characteristics of resources so far. A sensitivity analysis can reveal if the proposed architecture offers a reliable and highly available solution.


Dependability Modeling on OpenStack HA Environment

Dependability Modeling could also be performed on the OpenStack HA Environment we use at ICCLab. It is obvious that we High Availability could be realized in many different ways: we could use e. g. a distributed DRBD device to store all data used in OpenStack and synchronize the DRBD device with Pacemaker. Another possible solution is to build Ceph clusters and again use Pacemaker as synchronization tool. An alternative to Pacemaker is keepalived which also offers synchronization and control mechanisms for Load Balancing and High Availability. And of course one could also think of using HAProxy for Load Balancing instead of Ceph or DRBD.
In short: different architectures can be modelled. How this is done will be subject of a further blog post.

Kick off meeting of Concord phase2 held in Heidelberg @Eurescom

logoConcord is a supporting action of the FI PPP programme having as main EC objective the harmonisation, dissemination, facilitation and content of the core platform (FI-WARE), the Cloud Infrastructure (XIFI) and Use Case Projects.

From Phase 2 onward , one of the challenge of FI PPP, and therefore of Concord, is the deployment of the Generic Enablers (i.e IoT, QoS, Cloud, Bigdata etc..) of FI-WARE by Use Case Projects above the Cloud Infrastructure of XIFI available by telcos.

To support and  kick the activities off in Germany most of the agenda of the boards (Architecture, Advisory and Steering) have been drafted together with the plan for dissemination and events.  Most of the partners of Concord shall meet at the FIA in Dublin from 8 to 10 May.

FI-PPP programme will assume relevance from the phase 3 when most of the projects will be sustainable and in operation involving Small & Medium Enterprises selected by Concord project.

Vagrant, Devstack and the ICCLab

What?

So what is vagrant? In the words of its creator it allows you to:

“Create and configure lightweight, reproducible, and portable development environments.”

Vagrant is a ruby framework that automates a lot of the boring, painful setup a developer needs to do to work with services. In the case of the ICCLab those services are generally OpenStack services. We use vagrant to create consistent reproducible setups of our testbed on local development machines.

Why?

In the ICCLab we operate two testbeds, one that is stable and operates an OpenStack environment that does not change often. The other is a research testbed that is used to investigate the latest features of OpenStack, evaluate our own modifications or experiments upon OpenStack (e.g. Hadoop, CloudFoundry etc.). In order for code modifications to be placed on to the research test bed it must first prove that it is worthy. To prove itself it must be shown that it can run locally on a laptop/desktop and can be installed and configured automatically. The great advantage of this is that vagrant supports the same configuration framework, puppet, as is used on the test beds. Essentially what vagrant allows us to do is model our infrastructure but locally before deploying changes to metal.

How?

So the best way to get started with vagrant is by example. In this example, we’ll show you how to create a vagrant project to create an OpenStack devstack environment.

Install it!

To install vagrant, make sure you have virtualbox already installed. Then simply install it. On a mac it’s easiest to use the bundled installer but otherwise just execute gem install vagrant. Once installed execute vagrant help so see what you can do. You should see something like this:

[gist id=5309928]

The most common commands you’ll use are up, halt, reload and ssh

Play with it!

The example we will bring you through is setting up a devstack environment. To see all the code check out the github project here.

The first thing you need to do when creating a new vagrant project is to create a directory to host all your files. Once done you’ll need to execute:

[gist id=5310061]

Once done you should find a Vagrantfile created in your directory. This contains a basic template of how your vagrant project. For the purposes of this example we’ll use the following content:

[gist id=5310052]

What is important to note in this devstack_config.vm.box. This tells vagrant what ‘box’ it will use. A box is simply a VM image with a particular initial configuration (see here for more details). Boxes can also be created with veewee. You can also install other boxes from vagrantbox.es.

The next most important piece in this is the devstack_config.vm.provision block. This details how your software will be installed. In this example we are using puppet (in local mode) to install devstack. In the code block we specify where to find additional modules and where to find the vagrant specific manifests. Most importantly we note that the main “entry point” manifest is (devstack_puppet.manifest_file variable).

In our example, site.pp encodes the following steps to create our devstack VM:

  1. Install git
  2. Check out the devstack repository
  3. Customise the devstack installation by setting up the devstack localrc file
  4. Run devstack by executing stack.sh

You can see the contents of this manifest here.

If you’ve got this far then with the vagrant project cloned from github all you’ll have to do to get your devstack VM up and running is:

vagrant up

 

Easy eh?

Wrap up

The latest vagrant will add support for provisioning on the cloud (Amazon, OpenStack, Rackspace) and is also independent of hypervisor choice including support (paid) for VMware fusion.

Evaluation of HA technologies for OpenStack

As proposed in a former article different technologies must be evaluated in order to make the current MobileCloud environment suitable to High Availability (HA) requirements. The following article lists a basic evaluation of the different technologies that could be used.

Basically there are four technologies which allow to build a reliable HA-infrastructure for OpenStack:

  1. Build OpenStack on top of Corosync and use Pacemaker cluster resource manager to replicate cluster OpenStack services over multiple redundant nodes.
  2. For clustering of storage a DRBD block storage solution can be used. DRBD is a software that replicates block storage (hard disks etc.) over multiple nodes.
  3. Object storage services can be clustered via Ceph. Ceph is a clustered storage solution which is able to cluster not only block devices but also data objects and filesystems. Obviously Swift ObjectStore could be made highly available by using Ceph.
  4. OpenStack has MySQL as an underlying database system which is used to manage the different OpenStack Services. Instead of using a MySQL standalone database server one could use a MySQL Galera clustered database servers to make MySQL highly available too.

The different technologies have been evaluated according to their ability to make different OpenStack components highly available. The following table shows which technologies could be used to make the different OpenStack Services used in MobileCloud suitable to High Availability requirements.

table_ha_evaluation

Table 1.1: OpenStack Services and Clustering Technologies which make them suitable to HA requirements.

It is obvious that the different technologies can be used in different architectural setups. It is obvious that they must be used in a multi-node OpenStack Architecture. An architecture proposal will follow up in a further article.