Cloud High Availability

Overview

Cloud computing means:

  • On-demand self service
  • Virtualization
  • Elastic resource provisioning

Cloud computing service is comparable to public utility services like gas, telephone or water supply.

Economical value of cloud computing service is determined by reliability, availability and maintainability (RAM) characteristics.

Availability impacts the value of cloud computing as it is perceived by end users. High Availability systems increase guaranteed availability of a cloud computing service. Therefore they increase the economical value of a cloud computing service.

Objectives

Cloud HA initiative has the objectives:

  • To provide a service to analyze problems related with reliability and availability of cloud computing systems
  • To provide systems and services that increase reliability and availability of cloud computing systems

Research Challenges

The following challenges exist currently:

  • Measuring and analyzing availability: how can we experimentally determine reliability of cloud computing systems (VMs, storage etc.)? Design of adequate reliability measurement experiments is difficult, since we often have to rely on simulation of an outage.

  • Adapt reliability engineering methods to cloud computing: many reliability analysis and engineering techniques do exist (Fault Tree Analysis, FME(C)A, HAZOP, Markov Chains). How can we apply them to the area of cloud computing?

  • Analytic and monitoring systems: build systems that automatically monitor reliability of cloud resources and analyze problems.

  • Failure recovery and intelligent event management systems: build systems that intelligently detect and react to failures.

Currently there is almost no data available on reliability of different virtualization technologies like OpenStack or Docker.

Cloud vendors and manufacturers simply claim that their systems operate reliably without providing data to prove their claims. Think about an engineering company (like e. g. ABB or Siemens). Would they still be on the market if they were not able to tell their customers the exact hazard rates and MTBFs of their products? The IT industry is lagging behind other engineering industries. IT reliability engineering could be an interesting discipline that adds value to IT products and services.

Relevance to current and future markets

Business impact

Existing High Availability solutions:

  • Pacemaker: resource monitor that automatically detects failures and recovers failed components. Highly configurable, but also heavyweight. System administrators notoriously complain about its bad configuration interface. A bad configuration can make the system 7-8 times slower than a good configuration.

  • Keepalived: lightweight resource monitor. Unclear if this tool is well supported by its community.

  • IBM Tivoli: extremely heavyweight resource monitor and configuration management tool.

  • HAProxy: light load balancer. Great for web applications, but only applicable to HTTP-based services.

  • DRBD: disk replication technology. Fast and lightweight. Suitable for small disk networks.

  • Ceph: distributed storage and file system. Highly decentralized and great scalability.

  • GlusterFS: distributed storage and file system. Better scalability, but sometimes problem with partition tolerance.

  • Galera: MySQL cluster. True multimaster solution.

  • MySQL NDB Cluster: maps MySQL to simple key,value store. Requires adaption of applications to database interface.

  • Nagios: great monitoring system. Extendability and many plugins available.

  • Elasticsearch, Logstash, Kibana (ELK): log file monitoring system.

There are many HA systems available on the market, but almost no tool to analyze reliability of OpenStack and allow for automated intelligent recovery from failure.

Results

Presentation

HA_initiative_factsheet

Contact

Konstantin Benz
Obere Kirchgasse 2
CH-8400 Winterthur
Mail: benn__(at)__zhaw.ch

Cloud Application Management

Overview

Currently today, large internet-scale services are still architected using the principles of service-orientation. The key overarching idea is that a service is not one large monolith but indeed a composite of cooperating sub-services. How these sub-services are designed and implemented are given either by the respective business function, as in the case of traditional SOA to technical function/domain-context as in the case of the microservice approach to SOA.

In the end what both approaches result in, is a set of services, each of which carrying out a specific task/function. However, in order to bring all these service units together an overarching process needs to be provided to stitch them together and manage their runtimes. In doing so present the complete service to the end-user and for the developer/provider of the service.

The basic management process of stitching these services together is known as orchestration.

Orchestration & Automation

These are two concepts that are often conflated and used as if they’re equivocal. They’re not but they are certainly related, especially when Automation refers to configuration management (CM; e.g. puppet, chef, etc.).

Nonetheless, what both certainly share is that they are oriented around the idea of software systems that expose an API. With that API, manual processes once conducted through user interfaces or command line interfaces can now be programmed and then directed by higher level supervisory software processes. 

Orchestration goes beyond automation in this regard. Automation (CM) is the process that enables the provisioning and configuration of an individual node without consideration for the dependencies that node might have on others or vice versa. This is where orchestration comes into play. Orchestration, in combination with automation, ensures the phases of: 

  1. “Deploy”: the complete fleet of resources and services are deployed according to a plan. At this stage they are not configured. 

  2. “Provision”: each resource and service is correctly provisioned and configured. This must be done such that one service or resource is not without a required operational dependency (e.g. a php application without its database). 

This process is of course a simplified one and does not include the steps of design, build and runtime management of the orchestrated components (services and/or resources). 

  • Design: where the topology and dependencies of each component is specified. The model here typically takes the form of a graph. 

  • Build: how the deployable artefacts such as VM images, python eggs, Java WAR files are created either from source or pre-existing assets. This usually has a relationship to a continuous build and integration process. 

  • Runtime: once all components of an orchestration are running the next key element is that they are managed. To manage means at the most basic level to monitor the components. Based on metrics extracted, performance indicators can be formulated using logic-based rules. These when notified where an indicator’s threshold is breached, an Orchestrator could take a remedial action ensuring reliability. 

  • Disposal: Where a service is deployed through cloud services (e.g. infrastructure; VMs) it may be required to destroy the complete orchestration to redeploy a new version or indeed part of the orchestration destroyed. 

Ultimately the goal of orchestration is to stitch together (deploy, provision) many components to deliver a functional system (e.g. replicated database system) or service (e.g. a 3-tier web application with API) that operates reliably.

Objectives

The key objective of this initiatives are: 

  • Provide a reactive architecture that covers not only the case of controlling services but also service provider specific resources. What this means is that the architecture will exhibit responsiveness, resiliency, elasticity and be message-oriented. This architecture will accommodate all aspects that answer our identified research challenges. 
  • Deliver an open-source framework that implements orchestration for services in general and more specifically cloud-based services. 
  • Provide orchestration that provides reliable and cloud-native service delivery 

There are other objectives that are more related to delivering other research challenges.

Research Challenges

  • How to best enable and support a SOA, Microservices design patterns? 
  • How to get insight and tracing within each service and across services so problems can be identified, understood? 
  • Efficient management of large-scale composed service and resource instance graphs 
  • Scaling based on ‘useful’ monitoring, resource- and service-level metrics 
    • Consider monitoring system and scaling systems e.g. monasca 
    • How to program the scaling of an orchestrator spanning multiple providers and very different services? 
  • Provision of architectural recommendations and optimisation based on orchestration logic analysis
  • How to exploit orchestration capabilities to ensure reliability? ie, “load balancer for high availability” for cloud applications. How can load balancing service be automatically injected ensuring automatic scaling? 
    • How could a service orchestration framework bring the techniques of netflix and amazon (internal services) to a wider audience? 
    • Snapshot your service, rollback to your service’s previous state 
    • Reliability of the Service Orchestrator – how to implement this? HAProxy? Pacemaker? 
  • Orchestration logic should be able to be written in many popular languages 
  • Continuous integration of orchestration code and assets
  • Provider independent orchestration execution and accomdate many resource/service providers. 
    • Hybrid cloud deployments not well considered. How can this be done? 
    • Adoption of well known standards, openid, openauth and custom providers 
  • Authentication services – how to do this over disparate providers? 
  • How to create market places to offer services. Either the service being orchestrated or that service consuming others. 
  • Integration of business services that service owners can charge clients 
  • Containers for service workloads. Where might CoreOS, Docker, Rocket, Solaris Zones fit in the picture? 
    • If windows is not a hard requirement then it makes sense from a provider’s perspective to utilise container tech. 
    • Do we really need full-blown “traditional” IaaS frameworks to offer orchestration?

Relevance to Current & Future Markets

Many companies’ products aim to provide orchestration of resources in the Cloud, such as Sixsq (Slipstream), Cloudify, ZenOSS ControlCenter, Nirmata… There are also several open source projects, especially related to OpenStack, who touch the orchestration topic: OpenStack Heat, Murano, Solum.

Our market survey established a lack of non-cross domain (different service providers), service-oriented orchestration, with many of them taking the lower-level approach of orchestrating resources directly, and very often on a single provider. One aspect that all these solutions are very different in terms of programming models, however there is a growing interest in leveraging a standards-based orchestration description, with TOSCA being the most talked about. Another identified issue is the lack of reliability of services/resources orchestrated by these products, which is a barrier to adoption this initiative aims to solve. Along with this is that many solutions either have no runtime management or has limited capabilities.

  • In a more general point of view, cloud orchestration brings the following benefits to customers:
    Orchestration reduces the overhead of configuring manually all services comprising a cloud-native application
  • Orchestration allows to get out new updates to a service implementation faster and better tested through continuous testing integration and deployment
  • Reliable orchestration ensures the linkage and composition of services remaining running all the time, even where one or more components fail. This reduces downtime experienced by clients and keeps the service providers service always available.
  • Orchestration brings reproducibility and portability in cloud services, which may run on any cloud provider which the orchestration software controls

Architecture

The key entities of the architecture and their relationships to basic entities are shown in the follow diagram. To understand the complete detailed architecture, click on the picture to get the complete view.

c-orch-arch-entity-model

Related Projects

Contact

Energy Aware Cloud Load Management

Resource Management in Cloud Computing is a topic that has received much interest both within the research community and within the operations of the large cloud providers; naturally, as it has a significant impact on the cloud provider’s bottom line. Much of the work to date on resource management focuses on Service Level Agreements (for different definitions of an SLA); some of the work also considers energy as a factor.

Objectives

The primary objective of this work is to develop an energy aware load management solution for Openstack: variants of this have been proposed before and indeed implemented in other stacks (e.g. Eucalyptus) but no such capability exists for Openstack as yet. As well as realizing the solution, the work will involve deploying a variant of the solution on the cloud platform without impacting the operation of the platform and determining what energy savings can be made. It is worth noting that the classical load balancing approach which is very typical for resource managers in cloud contexts is somewhat contradictory to minimizing energy consumption; consequently, the very standard load management tools are not suitable for minimizing cloud energy consumption.

Research Challenges

The research challenges are the following:

  • How to characterize the load in the system, particularly relating to spikes in demand
  • How much buffer space to maintain to accommodate load spikes
  • How to perform load consolidation – what load should be moved to what machines?
  • When to perform load consolidation – how frequently should it take place?
  • What are the energy gains that can be achieved from such a dynamic system?

Relevance to current and future markets

Advanced resource management mechanisms are a necessity for cloud computing generally. In the case of large deployments, Facebook’s autoscale is an example of how they can be used to achieve energy savings of the order of 15%. In the case of smaller deployments, it is still the case that there are many [[ https://gigaom.com/2013/11/30/the-sorry-state-of-server-utilization-and-the-impending-post-hypervisor-era/ | highly underutilized servers ]] in typical Data Centres and ultimately there will be a need to reduce costs and realize energy efficiencies. The problem is a large, general problem and energy is one specific aspect of it – one of the challenges for this work is how to integrate with other active parts of the ecosystem.

There are some commercial offering which explicitly address energy efficiency in the cloud context. These include:

Impact

Architecture

See the Energy Theme for the larger system architecture.

Implementation Roadmap

The next steps on the implementation roadmap are as follows:

  • Get tunnelled post-copy live migration working with modifications to libvirt (Jan 2015)
  • See if this can be pushed upstream to libvirt
  • Consolidate live migration work into clearer message relating to the potential of live migration (Jan 2015)
  • Devise control mechanism which can be used to provide energy based control (Feb 2015)
  • Deploy and test on Arcus servers (Mar 2015)
  • Determine if it is ready for deployment on Bart/Lisa (April 2015)

Contact

 

Understanding Cloud Energy Consumption

Energy in general and energy consumption in particular is a major issue for the large cloud providers today. Smaller cloud providers – both private and public – also have an interest in reducing their energy consumption, although it is often not their most important concern. With increasing competition and decreasing margins in the IaaS sector, management of energy costs will become increasingly important.

A basic prerequisite of advanced energy management solutions is a good understanding of energy consumption. This is increasingly available in multiple ways as energy meters proliferate: as well as having energy meters on racks, energy meters typically exist in modern hardware and even at subsystem level within today’s hardware. That said, energy metering is something that is commonly coupled to proprietary management systems.

The focus of this initiative is to develop an understanding of cloud energy consumption through measurement and analysis of usage.

Objectives

The objectives of the energy monitoring initiative are:

  • to develop a tool to visualize how energy is being consumed within the cloud resources;
  • to understand the correlation between usage of cloud resources and energy consumption;
  • to understand what level of granularity is appropriate for capturing energy data;
  • to devise mechanisms to disaggregate energy consumption amongst users of cloud platforms.

Research Challenges

Understanding cloud energy consumption does not give rise to fundamental research challenges – indeed, it is more of an enabler for a more advanced energy management system. However, to have a comprehensive understanding of cloud energy consumption, some research effort is required. The following research challenges arise in this context:

  • How to consolidate energy consumption from disparate sources to realize a clear understanding of energy consumption within the cloud environment
  • How to correlate energy consumption with revenue generating services at a fine-grained level (compute, storage and networking)

Relevance to current and future markets

Understanding energy consumption is essential for the large cloud providers as well as for today’s Data Centre providers. Consequently, there are already solutions available which support monitoring of energy consumption of IT resources. Today’s solutions typically do not have specific knowledge of cloud resource utilization and consequently, there is an opportunity for new tools which correlate cloud usage with energy monitoring.

In the Gartner Hype Cycle for Green IT 2014, there are some related technologies which have growth potential over the coming years. Specifically, these are:

  • DCIM Tools
  • Server Digital Power Management Module
  • Demand Response Management Tools

As such, there are future market opportunities for such energy related work. However, we are still evaluating its commercial potential.

Impact

Architecture

TBA.

Implementation Roadmap

This work has largely resulted in a live demonstrator. At present, there is not a significant effort to add more features and capabilities.

The current tasks on the roadmap are:

  • Ensure system is live – maintenance task
  • Periodically review energy consumption
  • Review usage of cloud resources and determine the amount of resources necessary to support this amount of utilization; thus the potential energy saving can be determined.
  • Promote the tool somewhat
  • Presentation at next Openstack Meetup
  • Investigate deployment opportunities

Contact

Distributed Computing in the Cloud

by Josef Spillner

Description

The widespread adoption and the development of cloud platforms have increased confidence in migrating key business applications to the cloud. New approaches to distributed computing and data analysis have also emerged in conjunction with the growth of cloud computing. Among them, MapReduce and its implementations are probably the most popular and commonly used for data processing on clouds.

Efficient support for distributed computing on cloud platforms means guaranteeing high speed and ultra-low latency to enable massive amounts of uninterrupted data ingestion and real-time analysis, as well as cost-efficiency-at-scale.

Problem Statement

Currently, there are limited offerings of on demand distributed computing tools. The main challenge that applies not only to cloud environments, is to build such a framework that handles both big data and fast data. This means that the framework must be able to provide for both batch and stream processing, while allowing clients to transparently define their computations and query the results in real time. Provisioning such a framework on cloud platforms requires delivering rapid provisioning and maximal performance. Challenges also come from one of cloud’s most appealing features: elasticity and auto-scaling. Distributed computing frameworks can greatly benefit from auto-scaling, but current solutions do not support it yet.

Articles and Info

Contact Point

Piyush Harsh

Balazs Meszaros

Cloud Incident Management

Overview

Cloud Incident Management is a new research direction which focuses on conducting forensic investigations, electronic discovery (eDiscovery), and other critical aspects of security that are inherent in a multi-tenant, highly virtualized environment, along with any standards that need to be followed.

An Incident is an event which occurs outside the standard operation plan and which can lead to a reduction or interruption of quality of service. Incidents, in Cloud Computing, can lead to service shortages at all infrastructure levels (IaaS, PaaS, SaaS).

Incident Management provides a solid approach to address SLA incidents by covering aspects pertaining to service runtime in cloud through monitoring and analysis of events that may not cause SLA breaches but may disrupt service execution, or by covering aspects related to security by correlating and analyzing information coming from logs and generating adequate corrective responses.

Objectives

Current research will focus on addressing a series of research challenges pertaining to the Cloud Incident Management field:

  • Tackle possible temporary or long-term failures through the development of incident management tools, reference architectures and guidance for cloud customers to build systems resilient to cloud service failure.
  • Automated management of incident prevention, detection and response as well as recovery via clear SLA commitments and continuous monitoring will increase reliability, resilience, availability, trustworthiness and even accountability of cloud providers and customers.

Research Challenges and Open Issues

Current research challenges and open issues are as follows:

  • Correct identification, aggregation and correlation of events that make up an incident
  • Automated incident classification
  • Automated incident / problem management (workflow, processes)
  • Root cause analysis in cloud computing
  • Assessing business impact
  • Incident management in multi-cloud approaches
  • Transparency and audit
  • Cloud anti-patterns
  • Clear definition of outages given by cloud service providers

Architecture

A high level overview of the architecture can be seen below

Cloud Incident Management Architecture

Cloud Incident Management Architecture

Relevance to current and future markets

Business Impact

The following items represent the business impact incident management brings:

  • Automating incident management reduces the time spent by specialized personnel
  • Automation reduces response time to incidents and thus prevents or reduces downtime as it is able to act as soon as the incident has happened
  • Return on investment though availability, response time and throughput
  • Incident management increases efficiency, reduces operating expenses, offers agility and reliability for business users

Contact point

For further information or assistance please contact Valon Mamudi.

Cloud storage

Overview

Storage, together with computing and networking, is one of the fundamental parts of IaaS.

The research initiative on cloud storage at ICCLab, under the Infrastructure theme, focuses on the exploration of the limiting factors of the available storage systems, aiming at identifying new technologies and providing solutions that can be used to improve the efficiency of data management in cloud environments.

The need for advanced distributed architectures and software components allowing the deployment of secure, reliable, highly available and high-performing storage systems is clearly remarked by the fast growing rate of user-generated data. This trend sets challenging requirements for service and infrastructure providers to find efficient solutions for permanent data storage in their data centers.

About Cloud Storage Systems

A cloud storage system is typically obtained through a composition of software resources (running in a distributed environment), and a set of physical machines (i.e., servers), that exposes access to a logical layer of storage.

Cloud storage provides an abstract view of the multiple physical storage resources that it manages (these can be located across multiple servers, or even across different data centers) and it internally handles different layers of transparency that ensure reliability and performance.

The main concepts that are to be found in cloud storage systems are:

  • Data replication and reliability. Policies can be defined in such a way that copies of the same data are spread across different failure domains, to ensure availability and disaster recovery.
  • Data placement. A cloud storage system exposes a logical view of storage and internally handles how data is assigned to the available resources. This allows for e.g., striping data and improving access performance by using parallel accesses, or ensuring a proper load balancing between a set of nodes.
  • Availability. As a distributed system, cloud storage must not exhibit any single point of failure. This is usually achieved by introducing redundancy in hardware components and by implementing fail-over policies to recover from failures.
  • Performance. Concurrent accesses to data can improve data rates significantly as different portions of the same file or object can be provided by two different disks or nodes.
  • Geo-replication. A cloud storage system can replicate data in such a way that it is closer to where it is consumed (e.g., across data centers on different regions) to improve the access efficiency.

Objectives

  • Implement research ideas into working prototypes that can attract industrial interest
  • Obtain funding by participating in financed research projects
  • Produce and distribute our open source implementations
  • Keep and increase the reputation of the ICCLab in international contexts
  • Define a strong field of expertise in Distributed File Systems and software solutions for storage
  • Explore and implement clustered storage architectures

Research Topics

From an applied research perspective, the scenario of cloud computing and the growing demand for efficient data storage solutions, offers a ground where many areas and directions can be explored and evaluated.

Here at the ICCLab, the following aspects are currently being developed in the cloud storage initiative:

Contacts

Energy Efficiency and Cloud Computing – The Theme

The primary focus of the Energy Theme is on reducing the energy consumption of cloud computing resources. As compute nodes consume most of the energy in cloud computing systems, work to date has been focused on reducing the energy consumed by compute loads, particularly within the Openstack context. Although, as servers get increasingly instrumented, it is clear that there is potential in understanding the energy consumption with finer granularity and ultimately this can lead to energy efficiencies and cost savings.

Architecture

In the current work, the primary mechanism to achieve energy efficiencies is load consolidation combined with power control of servers. This could be augmented with managing server CPU power states, but it remains to be seen if this will lead to significant power savings. Another tool to achieve energy efficiencies is to add elastic load when the resources are underutilized – this does not reduce the overall energy consumption per se, but rather enables providers to get more bang for their energy buck.

energy-arch-v1

The current architecture of the Cloud Energy Efficiency Subsystem is shown above with the components performing the following functions:

  • an energy monitoring component: this obtains information on the energy consumption of the entire system – it may also make some kind of abstraction rather than working with highly granular data for each node;
  • a load characterization component: this component uses primarily ceilometer data to understand what is going on in the cloud – it makes an abstraction of the usage of the system over different timescales and particularly determines which level of burstiness exists in the load patterns;
  • a load consolidation mechanism: this will take the info on the system state and identify where load consolidation can be performed – it then issues a set of live migration instructions to the cloud to perform the consolidate. In general, it would be necessary to add some filters to support different hypervisors, bare metal servers, etc which makes it more complex;
  • physical server manager: this will turn off servers and turn them on as necessary – this will take input from the load characterization component to determine how much spare capacity to keep in the system to deal with variations in demand.

The specific interactions between these components is evolving as this is a work in progress.

Initiatives

At present, the theme comprises of two initiatives. These are

Related Projects

People

Cloud-Native Applications

This page is kept for archiving. Please navigate to our new site: blog.zhaw.ch/splab.

Overview

Since Amazon started offering cloud services (AWS) in 2006, cloud computing in all its forms became evermore popular and has steadily matured since. A lot of experience has been collected and today a high number of companies are running their applications in the cloud either for themselves or to offer services to their customers. The basic characteristics of this paradigm1 offer capabilities and possibilities to software applications that were unthinkable before and are the reason why cloud computing was able to establish itself the way it did.

What is a Cloud-Native Application?

In a nutshell, a cloud-native application (CNA) is a distributed application that runs on a cloud infrastructure (irrespective of infrastructure or platform level) and is in its core scalable and resilient as well as adapted to its dynamic and volatile environment. These core requirements are derived from the essential characteristics that every cloud infrastructure must by definition possess, and from user expectations. It is of course possible to run an application in the cloud that doesn’t meet all those criteria. In that case it would be described as a cloud-aware or cloud-ready application instead of a cloud-native application. Through a carefully cloud-native application design based on composed stateful and stateless microservices, the hosting characteristics can be exploited so that scalability and elasticity do not translate into significantly higher cost.

Objectives

  • The CNA initiative provides architecture and design guidelines for cloud-native applications, based on lessons-learned of existing applications and by taking advantage of best-practices (Cloud-Application Architecture Patterns).
  • Evaluate microservice technology mappings, related to container compositions, but also other forms of microservice implementations.
  • Provide recommendations for operation of cloud native applications (Continuous Delivery, Scaling, Monitoring, Incident Management,…)
  • Provide economic guidelines on how to operate cloud native applications (feasibility, service model (mix), microservice stacks, containers, …)
  • Investigate in, develop and establish a set of open source technologies, tools and services to build, operate and leverage state of the art cloud-native applications.
  • Support SMEs to build their own cloud-native solutions or reengineer and migrate existing applications to the cloud.
  • Ensure that all new applications developed within the SPLab and the ICCLab are cloud-native.

Relevance to current and future markets

– Business impact

  • Using cloud infrastructures (IaaS/PaaS) it is possible to prototype and test new business ideas quickly and without spending a lot of money up-front.
  • An application running on a cloud infrastructure – if designed in a cloud-native way – only ever uses as many resources as needed. This avoids under- or over- provisioning of resources and ensures cost-savings.
  • Developing software with services offered by cloud infrastructure and -platform providers enables even a small team to create highly scalable applications serving a high number of customers.
  • Developing cloud-native applications with a microservice architecture style allows for shorter development-cycles which reduces the time to adapt to customer feedback, new customer requirements and changes in the market.

– Correlation to industry forecasts

  • Cloud-native applications are tightly bound to cloud computing resp. to IaaS and PaaS since these technologies are used to develop and host applications and in the best case these applications are cloud native. So wherever these technologies stand in the Gartner Hype-Cycle Cloud-Native Applications can be thought of as being at the same stage.
  • The Cloud-Native Computing Foundation (CNCF.io) and other industry groups are formed to shape the evolution of technologies that are container packaged, dynamically scheduled and microservices oriented.

  • Container composition languages and tools are on the rise. A careful evaluation and assessment of technologies, lock-ins, opportunities is required. The CNA initiative brings sufficient academic rigor to afford long-term perspectives on these trends.

Relevant Standards and Articles

Architecture

Cloud-native applications are typically designed as distributed applications with a shared-nothing architecture composed of autonomous and stateless services that can horizontally scale and communicate asynchronously via message queues. The focus lies on the scalability and resilience of an application. The architecture style and current state of the art of how to design such applications is described with the term Microservices. While this is in no way the only way to architect cloud-native applications it is the current state of the art.

Generic CNA Architecture

The following architecture has been initially analysed, refined and realised by the SPLab CNA initiative team with a business application (Zurmo CRM) based on the CoreOS/fleet stack as well as on Kubernetes.

More recent works include a cloud-native document management architecture with stateful and stateless microservices implemented as composed containers with Docker-Compose, Vamp and Kubernetes.

Articles and Publications

G. Toffetti, S. Brunner, M. Blöchlinger, J. Spillner, T. M. Bohnert: Self-managing cloud-native applications: design, implementation and experience. FGCS special issue on Cloud Incident Management, 2016.

S. Brunner, M. Blöchlinger, G. Toffetti, J. Spillner, T. M. Bohnert, “Experimental Evaluation of the Cloud-Native Application Design”, 4th International Workshop on Clouds and (eScience) Application Management (CloudAM), Limassol, Cyprus, December 2015. (slides; author version; IEEExplore/ACM DL: to appear)

Blog Posts

Note: Latest posts are at the bottom.

Presentations

Open Source Software

Contact

Josef Spillner: josef.spillner(at)zhaw.ch

Footnotes

1. On-Demand Self-Service, Broad Network Access, Resource Pooling, Rapid Elasticity and Measured Service as defined in  NIST Definition of Cloud Computing

Rating, Charging, Billing

This page is kept for archiving. Please navigate to our new site: blog.zhaw.ch/splab.

Description

Financial accounting is a very critical process in the monetization process of any service. In the telecommunication world, these processes have long been documented, used, and standardized. Cloud computing being a relatively new paradigm, is still undergoing a transition phase. Many new services are being defined and there is still a huge untapped potential to be exploited.

Rating, Charging, and Billing (RCB) are key activities that allows a service provider to fix monetary values for the resources and services it offers, and allows it to bill the customers consuming the services offered.

Problem Statement

Given a general service scenario, how can the key metrics be identified. The identification of measurable metrics is essential for determining a useful pricing function to be attached to the metric. The challenges we are trying to address under this initiative are multi-dimensional. Is it possible to come up with a general enough RCB model that can address the needs of multiple cloud services – IaaS, PaaS, SaaS, and many more that would be defined in the future?

Where is the correct boundary between real-time charging strategy, which could be very resource intensive, versus a periodic strategy which has the risk of over-utilization of resources by the consumers between two cycles? Can a viable middle-path strategy be established for cloud based services. Can pre-paid pricing model be adapted for the cloud?

Simplified workflow

rcb-simplified

Architecture

MicroserviceRepository
User Data Recordshttps://github.com/icclab/cyclops-udr
Rating & Charginghttps://github.com/icclab/cyclops-rc
Billinghttps://github.com/icclab/cyclops-billing
Dashboardhttps://github.com/icclab/cyclops-support

Developing

  • rule engine and pricing strategies
  • prediction engine and alarming
  • revenue sharing and SLAs
  • usage collectors
  • scalability

Demos

  • vBrownBag Talk, OpenStack Summit, Paris, 2014

  • Swiss Open Cloud Day, Bern, 2014

  • CYCLOPS Demo

Presentations

  • OpenStack Meetup, Winterthur, 2014

Articles and Info

Research publications

Technology transfer

Research Approach

Following the ICCLab research approach

RCB_research

Contact

  • icclab-rcb-cyclops[at]dornbirn[dot]zhaw[dot]ch

Team