13.11.2014 mune

Monasca for Cloud Monitoring: Initial impressions

One of the focuses of the Cloud Incident Management research initiative are Monitoring as a Service solutions as they provide the building blocks for incident detection and resolution. As such, part of the work carried throughout the initiative was on identifying good, maintainable, monitoring solutions which can be easily adapted and integrated into a greater incident management architecture. The current blog post tries to cover Monasca showing the good, the bad and the ugly.

Monasca is a Monitoring as a Service solution which comes from HP and Rackspace: it focuses on providing a complete monitoring solution for Openstack. Monasca is an open source solution designed to be highly scalable, performant and fault-tolerant for a multi-tenant environment. It features a RESTful API though which one can interact with the system in order to query it or send metrics for processing.

The solution monitors both the Openstack Infrastructure as well as the VMs which run on it. Further, it can be easily integrated with Rackspace’s Stacktach which forwards all Openstack events coming from its different components to be processed by Monasca. Additionally, the primary authentication mechanism it uses as well as service catalog is Keystone.

The general architecture is designed from the ground up to provide stream processing of metrics and events which are sent via Kafka which acts as a general cloud bus through the publish-subscribe pattern.

The following are the core components of Monasca:

Monasca API — acts as a gateway to the whole system and can be interacted with via the RESTful API. It supports the management of alarms definitions, notification methods and management of metrics (storage, querying).
Monasca Client — is a python-based command line client and library used which can be used for easy communication with Monasca
Monasca Agent — retrieves metrics from the host on which it runs. It comes with a wide range of plugins and even supports the usage of Nagios plugins. It is made up of 3 subcomponents:
- Collector — collects the metrics,
- Monstatsd — a StatsD compatible solution for metrics coming from other sources,
- Forwarder — which takes the metrics from the Collector and Monstatsd, performs normalization if necessary, and forwards everything to the Monasca API.
Monasca Persister — handles everything which has to do with long term storage of metrics and alarms in distributed databases such as InfluxDB or analytics platforms such as Vertica.
Monasca Thresholding Engine — performs analysis on the incoming metrics and evaluates alarm definitions based on them. Any triggered alarms are published to Kafka so that the notification process can begin. This engine is based on Apache Storm as it is excellent for stream processing of data.
Monasca Notification Engine — performs notification based on triggered alarms. Currently it supports only email notification.
Monasca UI — is an user interface which is built into Openstack and supports browser based interaction with the Monasca

An interesting component which is under active development is Anomaly Detection to support unsupervised monitoring and detection of problems without needing to define rules. It is focused on using two specific algorithms: Numenta Platform for Intelligent Computing (NuPIC), which is neural computing type algorithm, and a Kolmogorov–Smirnov which is a two sample nonparametric fitness test. It is receiving a lot of interest right now.

As a whole, Monasca is made up of many moving parts, some written in Java, others in Python. However, as further integration with Openstack is a key priority – indeed it will soon be adopted as an official Openstack project – some of the components are being rewritten in python. There are, of course, parts like the Thresholding Engine that can not be rewritten because of their dependency on different libraries — in this particular case, dependency on Apache Storm.

A crucial part of the architecture is the way metrics, alarm definitions and alarms can be managed (e.g. created, updated, deleted). This functionality runs throughout the components each handling bits and pieces and is built in the RESTful API.

Metrics in Monasca allow great flexibility in their definition and, consequently, in their processing. As such, a metric has a name, dimensions, value and timestamp. The dimensions enable the metric to be easier classified, for example instance_id:4, service:client_portal are valid dimensions for a metric named cpu.average_5_mins.

The same can be said about alarms definitions. Alarm definitions allow you to specify an expression which will be evaluated based on the metrics received and as a consequence of that an alarm will be created. An example alarm definition expression could be:

max(cpu.load_avg_5_min) > 75

Alarm definitions determine the creation of alarms only when their expression can be processed against valid metrics. Alarms start with an UNDEFINED state which is changed to OK as long as the expression is not triggered. Once it is, the state changes to ALARM and the Notification Engine will start notifying people.

While Monasca has many nice features, it is a complex piece of software and suffers from some of the problems of large software projects which are not yet mature. As it is still under active development, unexpected behaviour is to be expected!

The following are some of the things I stumbled upon while trying to work with it:

Lack of documentation — The project lacks a lot of documentation about the API, the metrics, alarms, and generally how things work. Recently, more information has been added, but there are still many holes in the documentation.
Difficult installation and testing process — while there is a Vagrant installation available, when you need to deploy it on a dev environment or even production, there is no easy way to do it and no documentation about how to do it is available.
Code problems — while performing the installation i ran into many problems like hard coded links, lack of naming conventions, quirky support for authentication (needs the Openstack Keystone service key) etc.
Minimal support for logs — right now, only log statistics are being processed and sent as metrics and most of the time more is needed.

For those not involved with Openstack who want to make use of Monasca, the only real requirement is Keystone which can be separately installed.

Overall, Monasca is a great technology but it is still under heavy development. People who like to play with new technology can go ahead and try it. A list of the complete documentation, wiki and repositories for Monasca can be found here.

The next Monasca blog post will cover installation of Monasca in a typical environment.

Schlagwörter: cloud incident management, monasca, monitoring as a service

Service Engineering (ICCLab & SPLab)

Leave a Reply Cancel reply