Overview
Service hosting platforms such as IaaS and PaaS offer a lot of convenience for the service engineer. They take care of proper provisioning, scaling, healing and profiling. Yet, this platform support is limited when it comes to decisions which require insight into the application state and logic, especially considering applications or services ranging across multiple platforms with composition and orchestration.
The Active Service Management research initiative of the Service Prototoyping Lab aims at improving the state of the art by letting applications signal their states, conditions and requirements, and by letting platforms understand these signals. Emerging from the work on Cloud Native Applications (CNA), this initiative subsumes work on pro-active/predictive auto-scaling with application metric such as numbers of users and self-* properties such as self-healing by replacing crashed or unresponsive application parts with new instances.
Objectives
- Comparative evaluation of active service management techniques.
- Novel contributions to some of the techniques, in particular to scaling and resilience, but also service evolution.
- Turn research results into best practices to achieve an extended CNA design and appropriate hosting platforms.
Relevance to current and future markets
The commercial landscape of service hosting infrastructures generally assumes an issue-free operation without failures and demand spikes. Due to SLAs, this becomes costly when the assumptions do not hold anymore. With active service management contributions, many of the failures and spikes can be mitigated so that business continuity remains assured.
Relevant Standards and Articles
While management of services is an established topic in industry, the specific issue of actively managing environment-aware services is a genuine research topic. We present a selection of useful reading.
- Scryer predictive auto-scaling by Netflix: part 1, part 2
- Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services
Architecture
The following architecture figure describes Dynamite, a novel auto-scaling engine. This rule-based and re-usable engine has been designed in the context of CNA.
Articles and Publications
- Giovanni Toffetti Carughi, Sandro Brunner, Martin Blochinger, Florian Dudouet and Andrew Edmonds, “An architecture for self-managing microservices”, International Workshop on Automated Incident Management in Cloud (AIMC’15), Bordeaux, France, April 2015
Presentations
Open Source Software
- Dynamite scaling engine for CNA using custom metrics for its decisions
Contact
Giovanni Toffetti Carughi: toff(at)zhaw.ch