Category: Research Initiatives (page 2 of 2)

Distributed Computing in the Cloud

by Josef Spillner

Description

The widespread adoption and the development of cloud platforms have increased confidence in migrating key business applications to the cloud. New approaches to distributed computing and data analysis have also emerged in conjunction with the growth of cloud computing. Among them, MapReduce and its implementations are probably the most popular and commonly used for data processing on clouds.

Efficient support for distributed computing on cloud platforms means guaranteeing high speed and ultra-low latency to enable massive amounts of uninterrupted data ingestion and real-time analysis, as well as cost-efficiency-at-scale.

Problem Statement

Currently, there are limited offerings of on demand distributed computing tools. The main challenge that applies not only to cloud environments, is to build such a framework that handles both big data and fast data. This means that the framework must be able to provide for both batch and stream processing, while allowing clients to transparently define their computations and query the results in real time. Provisioning such a framework on cloud platforms requires delivering rapid provisioning and maximal performance. Challenges also come from one of cloud’s most appealing features: elasticity and auto-scaling. Distributed computing frameworks can greatly benefit from auto-scaling, but current solutions do not support it yet.

Articles and Info

Contact Point

Piyush Harsh

Balazs Meszaros

Cloud Incident Management

Overview

Cloud Incident Management is a new research direction which focuses on conducting forensic investigations, electronic discovery (eDiscovery), and other critical aspects of security that are inherent in a multi-tenant, highly virtualized environment, along with any standards that need to be followed.

An Incident is an event which occurs outside the standard operation plan and which can lead to a reduction or interruption of quality of service. Incidents, in Cloud Computing, can lead to service shortages at all infrastructure levels (IaaS, PaaS, SaaS).

Incident Management provides a solid approach to address SLA incidents by covering aspects pertaining to service runtime in cloud through monitoring and analysis of events that may not cause SLA breaches but may disrupt service execution, or by covering aspects related to security by correlating and analyzing information coming from logs and generating adequate corrective responses.

Objectives

Current research will focus on addressing a series of research challenges pertaining to the Cloud Incident Management field:

  • Tackle possible temporary or long-term failures through the development of incident management tools, reference architectures and guidance for cloud customers to build systems resilient to cloud service failure.
  • Automated management of incident prevention, detection and response as well as recovery via clear SLA commitments and continuous monitoring will increase reliability, resilience, availability, trustworthiness and even accountability of cloud providers and customers.

Research Challenges and Open Issues

Current research challenges and open issues are as follows:

  • Correct identification, aggregation and correlation of events that make up an incident
  • Automated incident classification
  • Automated incident / problem management (workflow, processes)
  • Root cause analysis in cloud computing
  • Assessing business impact
  • Incident management in multi-cloud approaches
  • Transparency and audit
  • Cloud anti-patterns
  • Clear definition of outages given by cloud service providers

Architecture

A high level overview of the architecture can be seen below

Cloud Incident Management Architecture

Cloud Incident Management Architecture

Relevance to current and future markets

Business Impact

The following items represent the business impact incident management brings:

  • Automating incident management reduces the time spent by specialized personnel
  • Automation reduces response time to incidents and thus prevents or reduces downtime as it is able to act as soon as the incident has happened
  • Return on investment though availability, response time and throughput
  • Incident management increases efficiency, reduces operating expenses, offers agility and reliability for business users

Contact point

For further information or assistance please contact Valon Mamudi.

Cloud storage

Overview

Storage, together with computing and networking, is one of the fundamental parts of IaaS.

The research initiative on cloud storage at ICCLab, under the Infrastructure theme, focuses on the exploration of the limiting factors of the available storage systems, aiming at identifying new technologies and providing solutions that can be used to improve the efficiency of data management in cloud environments.

The need for advanced distributed architectures and software components allowing the deployment of secure, reliable, highly available and high-performing storage systems is clearly remarked by the fast growing rate of user-generated data. This trend sets challenging requirements for service and infrastructure providers to find efficient solutions for permanent data storage in their data centers.

About Cloud Storage Systems

A cloud storage system is typically obtained through a composition of software resources (running in a distributed environment), and a set of physical machines (i.e., servers), that exposes access to a logical layer of storage.

Cloud storage provides an abstract view of the multiple physical storage resources that it manages (these can be located across multiple servers, or even across different data centers) and it internally handles different layers of transparency that ensure reliability and performance.

The main concepts that are to be found in cloud storage systems are:

  • Data replication and reliability. Policies can be defined in such a way that copies of the same data are spread across different failure domains, to ensure availability and disaster recovery.
  • Data placement. A cloud storage system exposes a logical view of storage and internally handles how data is assigned to the available resources. This allows for e.g., striping data and improving access performance by using parallel accesses, or ensuring a proper load balancing between a set of nodes.
  • Availability. As a distributed system, cloud storage must not exhibit any single point of failure. This is usually achieved by introducing redundancy in hardware components and by implementing fail-over policies to recover from failures.
  • Performance. Concurrent accesses to data can improve data rates significantly as different portions of the same file or object can be provided by two different disks or nodes.
  • Geo-replication. A cloud storage system can replicate data in such a way that it is closer to where it is consumed (e.g., across data centers on different regions) to improve the access efficiency.

Objectives

  • Implement research ideas into working prototypes that can attract industrial interest
  • Obtain funding by participating in financed research projects
  • Produce and distribute our open source implementations
  • Keep and increase the reputation of the ICCLab in international contexts
  • Define a strong field of expertise in Distributed File Systems and software solutions for storage
  • Explore and implement clustered storage architectures

Research Topics

From an applied research perspective, the scenario of cloud computing and the growing demand for efficient data storage solutions, offers a ground where many areas and directions can be explored and evaluated.

Here at the ICCLab, the following aspects are currently being developed in the cloud storage initiative:

Contacts

Cloud-Native Applications

This page is kept for archiving. Please navigate to our new site: blog.zhaw.ch/splab.

Overview

Since Amazon started offering cloud services (AWS) in 2006, cloud computing in all its forms became evermore popular and has steadily matured since. A lot of experience has been collected and today a high number of companies are running their applications in the cloud either for themselves or to offer services to their customers. The basic characteristics of this paradigm1 offer capabilities and possibilities to software applications that were unthinkable before and are the reason why cloud computing was able to establish itself the way it did.

What is a Cloud-Native Application?

In a nutshell, a cloud-native application (CNA) is a distributed application that runs on a cloud infrastructure (irrespective of infrastructure or platform level) and is in its core scalable and resilient as well as adapted to its dynamic and volatile environment. These core requirements are derived from the essential characteristics that every cloud infrastructure must by definition possess, and from user expectations. It is of course possible to run an application in the cloud that doesn’t meet all those criteria. In that case it would be described as a cloud-aware or cloud-ready application instead of a cloud-native application. Through a carefully cloud-native application design based on composed stateful and stateless microservices, the hosting characteristics can be exploited so that scalability and elasticity do not translate into significantly higher cost.

Objectives

  • The CNA initiative provides architecture and design guidelines for cloud-native applications, based on lessons-learned of existing applications and by taking advantage of best-practices (Cloud-Application Architecture Patterns).
  • Evaluate microservice technology mappings, related to container compositions, but also other forms of microservice implementations.
  • Provide recommendations for operation of cloud native applications (Continuous Delivery, Scaling, Monitoring, Incident Management,…)
  • Provide economic guidelines on how to operate cloud native applications (feasibility, service model (mix), microservice stacks, containers, …)
  • Investigate in, develop and establish a set of open source technologies, tools and services to build, operate and leverage state of the art cloud-native applications.
  • Support SMEs to build their own cloud-native solutions or reengineer and migrate existing applications to the cloud.
  • Ensure that all new applications developed within the SPLab and the ICCLab are cloud-native.

Relevance to current and future markets

– Business impact

  • Using cloud infrastructures (IaaS/PaaS) it is possible to prototype and test new business ideas quickly and without spending a lot of money up-front.
  • An application running on a cloud infrastructure – if designed in a cloud-native way – only ever uses as many resources as needed. This avoids under- or over- provisioning of resources and ensures cost-savings.
  • Developing software with services offered by cloud infrastructure and -platform providers enables even a small team to create highly scalable applications serving a high number of customers.
  • Developing cloud-native applications with a microservice architecture style allows for shorter development-cycles which reduces the time to adapt to customer feedback, new customer requirements and changes in the market.

– Correlation to industry forecasts

  • Cloud-native applications are tightly bound to cloud computing resp. to IaaS and PaaS since these technologies are used to develop and host applications and in the best case these applications are cloud native. So wherever these technologies stand in the Gartner Hype-Cycle Cloud-Native Applications can be thought of as being at the same stage.
  • The Cloud-Native Computing Foundation (CNCF.io) and other industry groups are formed to shape the evolution of technologies that are container packaged, dynamically scheduled and microservices oriented.

  • Container composition languages and tools are on the rise. A careful evaluation and assessment of technologies, lock-ins, opportunities is required. The CNA initiative brings sufficient academic rigor to afford long-term perspectives on these trends.

Relevant Standards and Articles

Architecture

Cloud-native applications are typically designed as distributed applications with a shared-nothing architecture composed of autonomous and stateless services that can horizontally scale and communicate asynchronously via message queues. The focus lies on the scalability and resilience of an application. The architecture style and current state of the art of how to design such applications is described with the term Microservices. While this is in no way the only way to architect cloud-native applications it is the current state of the art.

Generic CNA Architecture

The following architecture has been initially analysed, refined and realised by the SPLab CNA initiative team with a business application (Zurmo CRM) based on the CoreOS/fleet stack as well as on Kubernetes.

More recent works include a cloud-native document management architecture with stateful and stateless microservices implemented as composed containers with Docker-Compose, Vamp and Kubernetes.

Articles and Publications

G. Toffetti, S. Brunner, M. Blöchlinger, J. Spillner, T. M. Bohnert: Self-managing cloud-native applications: design, implementation and experience. FGCS special issue on Cloud Incident Management, 2016.

S. Brunner, M. Blöchlinger, G. Toffetti, J. Spillner, T. M. Bohnert, “Experimental Evaluation of the Cloud-Native Application Design”, 4th International Workshop on Clouds and (eScience) Application Management (CloudAM), Limassol, Cyprus, December 2015. (slides; author version; IEEExplore/ACM DL: to appear)

Blog Posts

Note: Latest posts are at the bottom.

Presentations

Open Source Software

Contact

Josef Spillner: josef.spillner(at)zhaw.ch

Footnotes

1. On-Demand Self-Service, Broad Network Access, Resource Pooling, Rapid Elasticity and Measured Service as defined in  NIST Definition of Cloud Computing

Rating, Charging, Billing

This page is kept for archiving. Please navigate to our new site: blog.zhaw.ch/splab.

Description

Financial accounting is a very critical process in the monetization process of any service. In the telecommunication world, these processes have long been documented, used, and standardized. Cloud computing being a relatively new paradigm, is still undergoing a transition phase. Many new services are being defined and there is still a huge untapped potential to be exploited.

Rating, Charging, and Billing (RCB) are key activities that allows a service provider to fix monetary values for the resources and services it offers, and allows it to bill the customers consuming the services offered.

Problem Statement

Given a general service scenario, how can the key metrics be identified. The identification of measurable metrics is essential for determining a useful pricing function to be attached to the metric. The challenges we are trying to address under this initiative are multi-dimensional. Is it possible to come up with a general enough RCB model that can address the needs of multiple cloud services – IaaS, PaaS, SaaS, and many more that would be defined in the future?

Where is the correct boundary between real-time charging strategy, which could be very resource intensive, versus a periodic strategy which has the risk of over-utilization of resources by the consumers between two cycles? Can a viable middle-path strategy be established for cloud based services. Can pre-paid pricing model be adapted for the cloud?

Simplified workflow

rcb-simplified

Architecture

MicroserviceRepository
User Data Recordshttps://github.com/icclab/cyclops-udr
Rating & Charginghttps://github.com/icclab/cyclops-rc
Billinghttps://github.com/icclab/cyclops-billing
Dashboardhttps://github.com/icclab/cyclops-support

Developing

  • rule engine and pricing strategies
  • prediction engine and alarming
  • revenue sharing and SLAs
  • usage collectors
  • scalability

Demos

  • vBrownBag Talk, OpenStack Summit, Paris, 2014

  • Swiss Open Cloud Day, Bern, 2014

  • CYCLOPS Demo

Presentations

  • OpenStack Meetup, Winterthur, 2014

Articles and Info

Research publications

Technology transfer

Research Approach

Following the ICCLab research approach

RCB_research

Contact

  • icclab-rcb-cyclops[at]dornbirn[dot]zhaw[dot]ch

Team

PaaS on OpenStack

Description

In this initiative we focus on bringing Platform as a Service (PaaS) to the ICCLab testbed, on top of OpenStack. We are investigating and evaluating all the requirements for running various open source PaaS solutions like Cloud Foundry (http://www.cloudfoundry.org), OpenShift (http://www.openshift.org) and Cloudify (http://www.cloudifysource.org) and extend the testbed for monitoring, rating, charging and billing on PaaS level.

Plattform as a Service (PaaS) is focusing on developers as customers by providing them a platform containing the whole technology stack to run applications and services supporting all the typical cloud characteristics like On-Demand Self-Service, Rapid Elasticity, Measured Service, Resource Pooling etc.  Typically these platforms consist of:

  • Runtime environments (Java, Ruby, Python, NodeJs, .Net, …),
  • Frameworks (Spring, JEE, Rails, Django, … ) and
  • Services like
    • Datastores (SQL, NoSQL, Key-Value-Stores, File-/Object-Storage,…),
    • Messaging (Queuing, PubSub, EventProcessing,…)
    • Management Services (authentication, logging, monitoring,…)

Our full PaaS (mid-longterm) mission is described in the PaaS research theme page

Problem Statement

PaaS technologies and offerings are still in early stages. Lot of hype and movement in the market. Standards are not yet established. Lots of the open source tools like CloudFoundry and OpenShift are still in beta stages and not mature. Moving the responsibility for the operation of runtimes, frameworks and services to the cloud provider creates many new challenges. First of all the deployment and operation has to be totally automated and tooling for operation and management is needed. New parameters for monitoring and rating are required and new charging models to be developed and evaluated.  Other challenges are the automated interfacing with the underlying infrastructure layer (in our case OpenStack), to provide and guarantee the requested performance and scalability. Last but not least we have to investigate how to extend the frameworks with new services and runtimes.

Articles and Info

There are a number of presentations about PaaS in general and CloudFoundry/BOSH specifically used in the ICCLab:

Contact Point

Cloud Performance

Description

Virtualisation is at the core of Cloud Computing and therefore its performance are crucial to be able to deliver a top-of-the-class service. Also, being able to provide the adequate virtualised environment based on the user requirements is key for cloud providers.

SmartOS, a descendant of Illumos and OpenSolaris, presents features such as containers and KVM virtualisation and network virtualisation through Crossbow that makes it particularly interesting in this context.

This research initiative aims to:

  • Evaluate performance of SmartOS virtualisation in respect to compute, i.e. containers and KVM, storage and networking
  • Compare SmartOS virtualisation with other techniques (Linux KVM, VMware, Xen)
  • Identify use cases and workloads that best suits the different techniques

Problem Statement

Cloud providers must be able to offer a single server to multiple users without them noticing that that they are the only user of that machine. This means that the underlying operating system must be able to provision and deprovision, i.e. create and destroy, virtual machines in a very fast seamless way; it should also allocate physical resources efficiently and fairly amongst the users and should be able to support multithreaded and multi-processor hardware. Lastly, the operating system must be highly reliable and, in case something doesn’t work as it should, it must provide a way to quickly determine what the cause is. at the same time, a customer of the cloud provider will also expect the server to be fast, meaning that the observed latency should be minimal. The provided server should also give the flexibility to get extra power when needed, i.e. bursting and scaling – and be secure, meaning that neighboring users must not interfere with each other.

Articles and Info

Contact Point

Cloud Monitoring

Description

A monitoring system especially in a Infrastructure as a Service environment should be considered indispensable and required. Knowing which resources are used by which Virtual Machines (and tenants) is crucial for cloud computing providers as well for their customers.

Customers want to be sure they get what they pay for at any time whereas the cloud provider needs the information for his billing and rating system. Furthermore this information can be useful when it comes to dimension and scalability questions.

For monitoring a Cloud environment there are different requirements:

  • An Cloud monitoring tool must be able to monitor not only physical machines but also virtual machines or network devices.
  • The information of the monitored resources must be assignable to its tenant.
  • The metered values must be collected and correlated automatically
  • The monitoring tool must be as generic as possible to ensure support of any device.
  • The monitoring tool must offer an API.

Problem Statement

Many of the available monitoring tools allows to collect data from particular devices such as physical devices or virtual machines. However, most of these tools don’t monitor newly created instances of a Cloud environment automatically. For this reason the ICCLab decided to use Ceilometer to monitor their OpenStack installation. Ceilometer is a core project of OpenStack but doesn’t collect data from physical devices like network switches. Therefore, the ICCLab extends Ceilometer to allow it to collect data from physical devices.

Articles and Info

Contact Point

Cloud Automation & Orchestration

Description

At the heart of the ICCLab is the Management Server, which provides an easy way to stage different setups for different OpenStack instances (productive, experimental, etc.). The Management Server provides a DHCP, PXE and pre-configured processes which allow a bare metal computing unit to be provisioned automatically, using Foreman, and then have preassigned roles installed, using a combination of Foreman and Puppet. This provides a great deal of flexibility and support for different usage scenarios. Puppet is a key enabler. Puppet is an infrastructure automation system for the efficient management of large scale infrastructures. Using a declarative approach puppet can manage infrastructure lifecycles from provisioning, configuration, update and compliance management. All of these management capabilities are managed logically in a centralised fashion, however the system itself can be implemented in a distributed manner. A key motivation in using puppet is that all system configuration is codified using puppet’s declarative language. This enables the sharing of “infrastructure as code” not only through out an organisation but outside of an organisation by following open source models.

Problem Statement

With the infinite resources available today through cloud computing it is very possible to have large numbers of cloud resources (e.g. compute, storage, networking) delivering services to end users. Managing these cloud resources and the software stacks deploy on top is a huge challenge when the number of resources to configure and mange increase beyond single digits. The only way forward here is to investigate, adopt, improve automated management, configuration and orchestration tools. With automation comes increased testability, reliability (when done right) and ultimately faster times to market as exemplified by continuous integration and DevOps practices.

Articles and Info

There are a number of blog posts detailing how foreman and puppet are used in the ICCLab:

Contact Point

Cloud Interoperability

Description

To be interoperable means to imbue the common abilities of mobility to cloud service instances, to extract all service instance described by a common representation, to share all cloud service instance related data in and out of providers and to allow cloud service instances work together.

To bring interoperability, it must be present at the lowest level of the cloud stack and so IaaS should firstly be the target, with those interoperability capabilities offered to the upper layer of PaaS where lock-in is even more prevalent. To execute upon this, standard specifications need to be agreed upon by both research and industrial domains. In essence this means, in the context of IaaS, to agree upon standardised ways to import and export IaaS customer deployments, to interface with those deployments in a common way during their lifecycle and runtime and to have access to the data supplied and generated and in creating that deployment. These three types of standards must cooperate and integrate as there is no one SDO that can capture research and industry interest and supply the relevant skills all as one. In terms of the IaaS domain this specifically means:

  • Standardised specifications for the import and export of virtualised infrastructure service instances
  • Standardised runtime specification to allow the run-time and life cycle management of virtualised infrastructure service instances
  • Standardised data access, import and export capabilities to the data that created and was generated by the virtualised service instances

Problem Statement

There are many challenges to cloud computing but one core to enabling further value is the removal of lock-in and enabling of interoperability between cloud services. Typical approaches to providing interoperability include setting standards through standards defining organisations such as DMTF, OGF, SNIA. The other approach is providing software tool kits and frameworks such as jClouds, Apache libcloud and fog.io that provide abstract programmatic APIs who’s implementation carries out the semantic and syntactical mapping from the abstract interface to the target cloud service provider’s interface. Where as both approaches provide some uniformity to operating with cloud services, they do not cover other life cycle aspects. One area of investigation within the ICCLab is how to relocate services using one of the two (or potentially both).

Articles and Info

Contact Point

Andy Edmonds

Newer posts »