Cloud-Native Applications is a new research initiative which was launched by the ICCLab a couple of months ago. The initial time was used to gather knowledge on this very interesting and broad topic. For a detailed description of the initiative and cloud-native applications in general head to the initiative overview page.
But all the knowledge in the world is not worth much without some actual experience. And therefore as of the beginning of this year we launched the Cloud-Native Applications Seed Project or CNA Seed Project for short. The goal of this project is to take a traditional business application which was not designed to run in the cloud and modify it so that it can be deployed and operated in a cloud-environment.
I have introduced Disaster Recovery (DR) services in the tutorial of last ICCLAB newsletter which also made an overview of possible OpenStack configurations. Several configuration options could be considered. In particular, in case of a stakeholder having both the role of Cloud provider and DR service provider, a suitable safe configuration consists in distributing the infrastructures in different geographic locations. OpenStack gives the possibility to organise the controllers in different Regions which are sharing the same keystone. Here you will find the an overall specification using heat, I will simulate same configurations using devstack on Virtual BoX environments. One of the scope of this blog post is to support the students who are using Juno devstack.
There will be a second part of this tutorial to show a possible implementation of DR services lifecycle between two regions.
Beni Truninger is a researcher at the ICCLab, currently working on the Rating, Charging and Billing project. Before joinging the cloud computing team, he had already been at the InIT for two and a half years, participating in different projects in Information Security.
As mentioned in the first post about SmartDataCenter, it features various APIs. In this post we will have a look at them. Further I would like to present sdcadmin & sdc-heat, two small Python projects I have been working on. The former is a Python client library for SDCs admin APIs. The latter is an OpenStack Heat plugin that allows provisioning of SmartMachines and KVM VMs on SDC.
Cloud computing service is comparable to public utility services like gas, telephone or water supply.
Economical value of cloud computing service is determined by reliability, availability and maintainability (RAM) characteristics.
Availability impacts the value of cloud computing as it is perceived by end users. High Availability systems increase guaranteed availability of a cloud computing service. Therefore they increase the economical value of a cloud computing service.
Cloud HA initiative has the objectives:
To provide a service to analyze problems related with reliability and availability of cloud computing systems
To provide systems and services that increase reliability and availability of cloud computing systems
The following challenges exist currently:
Measuring and analyzing availability: how can we experimentally determine reliability of cloud computing systems (VMs, storage etc.)? Design of adequate reliability measurement experiments is difficult, since we often have to rely on simulation of an outage.
Adapt reliability engineering methods to cloud computing: many reliability analysis and engineering techniques do exist (Fault Tree Analysis, FME(C)A, HAZOP, Markov Chains). How can we apply them to the area of cloud computing?
Analytic and monitoring systems: build systems that automatically monitor reliability of cloud resources and analyze problems.
Failure recovery and intelligent event management systems: build systems that intelligently detect and react to failures.
Currently there is almost no data available on reliability of different virtualization technologies like OpenStack or Docker.
Cloud vendors and manufacturers simply claim that their systems operate reliably without providing data to prove their claims. Think about an engineering company (like e. g. ABB or Siemens). Would they still be on the market if they were not able to tell their customers the exact hazard rates and MTBFs of their products? The IT industry is lagging behind other engineering industries. IT reliability engineering could be an interesting discipline that adds value to IT products and services.
Relevance to current and future markets
Existing High Availability solutions:
Pacemaker: resource monitor that automatically detects failures and recovers failed components. Highly configurable, but also heavyweight. System administrators notoriously complain about its bad configuration interface. A bad configuration can make the system 7-8 times slower than a good configuration.
Keepalived: lightweight resource monitor. Unclear if this tool is well supported by its community.
IBM Tivoli: extremely heavyweight resource monitor and configuration management tool.
HAProxy: light load balancer. Great for web applications, but only applicable to HTTP-based services.
DRBD: disk replication technology. Fast and lightweight. Suitable for small disk networks.
Ceph: distributed storage and file system. Highly decentralized and great scalability.
GlusterFS: distributed storage and file system. Better scalability, but sometimes problem with partition tolerance.
Galera: MySQL cluster. True multimaster solution.
MySQL NDB Cluster: maps MySQL to simple key,value store. Requires adaption of applications to database interface.
Nagios: great monitoring system. Extendability and many plugins available.
Joyent recently open sourced the IaaS Platform SmartDataCenter and the Object Storage Manta, the software they use for their own service offerings. So, what’s all the buzz about? Why should you be excited? Why is it even worth talking (or in this case, writing) about SDC when we have OpenStack? In this blog post I will cover some of the fundamentals of SDC and why it’s worth a second look.
How reliable are your OpenStack VMs? How many outages do you expect to occur during 8 months of operation? Do your VMs crash regularily, randomly or do VM outages increase over time? These questions can only be answered if we perform a reliability analysis of the virtual machines that we manage. In this small guide we show you how to check reliability of VMs in your OpenStack environment. In part 1 of this 4 part series we explain the basic concepts of reliability engineering.
The vast field of reliability engineering has been used widely in various engineering disciplines like aircraft design, civil engineering, electricity management or product management. Though reliability engineering has proven to help in successfully building high quality engineering products, it has almost never been used in cloud computing so far. There might be some distrust among programmers in these scientifically proven reliability analysis methods, since they involve math and statistical exploration. But with a little introduction this is not a severe problem that we should worry about.
Reliability engineering simply deals with analyzing and measuring the outage behavior of engineered systems, trying out and testing system improvements that make the system more reliable, implementing system improvements and validating if the system improvements have reduced the occurence of outages or not. The first step is the analysis of outage behavior. How can outages be analyzed?