Ceilometer Performance issues

Update: This does not apply to Icehouse. This flag was to activate an experimental feature  -this option no longer exists in Icehouse. (It is in Havana, however).

There have been some criticisms of the implementation of Ceilometer (or Telemetry as of Icehouse) – however, it’s still the main show in town for understanding what’s going on inside your Openstack.

We’ve been doing a bit of work with it in multiple projects. In one of our efforts – pulling in energy info via kwapi – we noticed that Ceilometer really crawls to a halt with the API giving a response in 20s when trying to enter just a single energy consumption data point. (Yes, it might make more sense to batch these up…). For our simple scenario, this performance was completely unworkable.

Our Ceilometer installation just used the basic Mirantis Fuel v4.0 which installed a variant of Havana. The db backend was mysql (chosen by Fuel) and we just went with the default configuration parameters.

There are known performance issues with Ceilometer (issue, presentation mentioning it, mailing list discussion) and it seems that Icehouse has made some significant strides in improving performance of Ceilometer/Telemetry; however, we have not managed to perform the upgrade as yet – maybe some of these issues have already been fixed.

For our work, we were able to significantly improve the performance of the Ceilometer API by activating (experimental!) thread pooling on the db: this had the effect of making entering single energy consumption data points take less than one second (down from 20s) and a larger query of the list of available meters took 5s compared to a previous 34s. It just involved setting

use_tpool=true

in /etc/ceilometer/ceilometer.conf and bingo – significant uptick in performance (for our small, experimental system).

Not sure how widely applicable this is, and not sure if it’s realistic for production environments – for our experimental system, it turned an unworkable system into something which is usable (but certainly not speedy!)

 

Swiss Informatics SIG CC kick-off meeting – 21-May-2014

SI SIG cc

The participation to the announced Swiss Informatics SIG CC  kick-off meeting, hosted by ZHAW ICCLAB on 21-May-2014 in Winterthur, was good (industrial, SME, academic and associations) and the assembly confirmed that overall objectives of this Group are the understanding of cloud computing technologies and related supporting actions for Swiss industry, research and education.

To achieve these goals many actions were discussed in the agenda, like: Promoting the technological progress through joint research, Cloud Computing in Higher Education, Survey and armonise existing Swiss initiatives and groups on Cloud Computing,  Establishing personal and business networks, Dissemination and co-organisation of relevant events, Provision of advisory services and H2020 consultation contributions, Swiss Gov projects, Focus on relevant open-source platforms and delivery of the results with white papers when needed.

The dissemination will be through social media, SIG CC portal (coming soon based on cloudcomp.ch ) and mail list.

The General Assembly for the SIG will be held in six months and announced through our  channels. The scope of the next meeting will be the approval of medium terms objectives  and confirmation of the identified board. Contributions, minutes and web portal will be available in next weeks.

IMG_20140521_115828[1]  IMG_20140521_124641[1]

sig cc koffslide sig cc

Software Defined Networking

Die Vernetzung verschiedenster Applikationen und Geräte, hohe Ansprüche an Erreichbarkeit, eine Vorstellung von absoluter Kontrolle, ununterbrochener Informationsfluss, Clouddienste sowie unzählige weitere Anwendungen haben die Anforderungen an Netzwerkstruktur und Provider in den letzten Jahren in neue Dimensionen gebracht. Der Auftrag an den Provider, Pakete von A nach B zu befördern ist zwar unverändert, doch wie sieht es aus, wenn eine Anwendung für die Übertragung den schnellsten Pfad benötigt, wenn plötzlich eine hohe Bandbreite gewünscht wird oder wenn eine redundante Verbindung aufgebaut werden möchte?

Bei diesen Fragen wird wohl schnell ersichtlich, dass es nicht mehr der Netzwerkadministrator ist, der die Anforderungen ans Netzwerk stellt, sondern die Anwendungen. Der Programmierer hätte gerne ein dynamisches Konstrukt, welches sich laufend an die Bedürfnisse des Benutzers anpassen lässt.

Software Defined Networking wird als die Lösung angesehen, wenn es um die Strukturierung und Verwaltung zukünftiger Datennetzwerke gehen soll. Die Stanford University beschäftigte sich bereits relativ früh mit dem Gedanken, Kontroll- und Datenschicht eines Netzwerks zu trennen und hat dafür ein erstes Grundkonzept erarbeitet. ONF (Open Networking Foundation) konkretisiert die SDN Architektur wie folgt: Direkt programmierbar, agil, zentral verwaltet, programmierte Konfiguration, offener Standard sowie Verkäufer unabhängig. Der Gedanke der Entkopplung ist bereits heute bei vielen Standard- Routern und Switchs umgesetzt, die beiden Schichten sind jedoch in demselben Gehäuse und fest konfiguriert. Die ONF definierte, dass bei SDN die Schichtenkopplung gelöst werden muss. Der Kontroller soll zentralisiert in einem Server untergebracht sein, sodass ein einzelner Server mehrere Switchs bedienen kann.

Abbildung1

Abbildung 1 stellt ein einfaches Netzwerk mit 5 Switchs dar, sowie deren gleichwertigen Aufbau als SDN. Im Gegensatz zum konservativen Netzwerk werden im SDN die dezentralen Kontrollschichten und Applikationen im übergeordneten SDN Kontroller implementiert. Für die Kommunikation zwischen Kontroller und Switch wird eine neue Verbindung benötigt, hier dargestellt durch das „Open Flow Interface“. Dieses Interface verwaltet die Beziehungen zwischen den einzelnen Paketen und deren Weiterleitung im Netztwerk. Möchte nun Host A mit Host B kommunizieren, schaut der Switch bei Erhalt des ersten Datenpakets in seiner Flow Table nach, ob ein entsprechender Eintrag vorhanden ist. Existiert kein Eintrag, bzw. eine sogenannte Action, wird das Paket an den zentralen Kontroller weitergeleitet (Abbildung 2). Dieser hat nun die Aufgabe, den Pfad anhand des erhaltenen Pakets durch das Netzwerk zu konfigurieren. Nach einer erfolgreichen Konfiguration gibt es verschiedene Möglichkeiten, wer das erste Datenpaket auf die Reise durchs Netzwerk sendet.

Entweder der zentrale Kontroller, welcher das Paket zu Beginn erhalten hat, oder aber der erste Switch, welcher vom Kontroller eine entsprechende Action erhält, dass er das Paket nun senden kann.

Wie bereits erwähnt, ist der SDN Kontroller für die Konfiguration der Flow Table zuständig. Er hat dabei die Möglichkeit, anhand verschiedenster Kombinationen von Kriterien Aktionen zu erstellen. Vom einfachen Routing bis hin zu personalisierten Regeln die sich auf MAC-, IP-Adresse oder einen TCP Port beziehen, können hier Aktionen implementiert werden.

Host A nach Host B

Abbildung2

  • Das erste Datenpaket (1) wird direkt an den SDN Kontroller weitergeleitet, da die Flow Table im Switch noch keinen passenden Eintrag hat. Der Switch behält das Datenpaket, sodass der Kontroller zu einem späteren Zeitpunkt dessen Weiterleitung auslösen kann.

  • Der Kontroller beginnt mit dem versenden der Konfigurations-pakete (2-5) an jene Switchs, welche im Pfad von A nach B enthalten sind.

  • Nach der Switchkonfiguration sendet der Kontroller eine Nachricht an den ersten Switch mit der Action, dass das erste Datenpaket nun versendet werden kann.

Um das neue Paradigma SDN weiter zu erforschen, hat das InIT und InES – beides Institute der ZHAW – gemeinsam am 1. Januar 2014 mit einem Aufbauprogramm für Software Defined Networking begonnen. Das Projekt, dass über ein Jahr läuft, hat sich die folgenden 3 Schwerpunkte zum Ziel gesetzt:

  • Aufbau eines Testbed um SDN Applikationen sowie SDN Komponenten zu testen
  • Entwicklung von SDN Applikationen zu Demonstrationszwecken des Testbed
  • Überprüfen und durchführen von Performancetests verfügbarer Netzwerkgeräten (physich und virtuell)

 

If it’s good enough for James Bond, it’s good enough for ICCLab…

Pushing boundaries is what ICCLab is all about: while this has been largely restricted to the realm of Cloud Computing heretofore, new dimensions were opened up last weekend, when the team ventured south to the beautiful setting of the Contra Dam at Verzasca to do the 200m 007 bungee jump (featured in Goldeneye).

6 hardy souls were brave enough to take the leap – the rest of us just watched in a mix of awe, admiration, disbelief and horror.

TMB stepped up first – some would say he was nervous and just wanted to get it over with, but that is vehemently denied by the official sources. As you can see from the video, he realized just after he committed what he had really gotten himself in to!

TMB following in the steps of JHB.

Next up was Michael Erne, one of the bachelor’s students in the lab. This was Michael’s second time to jump, so he was cool as a cucumber, which tended to irk some of the others who were a little on edge about the whole proposition. Michael demonstrated his experience by performing a solid jump and said that unlike the first time – where nerves were a factor – he could just focus on enjoying the freefall.

Michael’s experience shows.

Micheal was followed by Patrick Real, who is responsible for financials in the group. As you would expect from someone who deals with money, he took a cool, calm and collected approach; business as usual. Indeed, so relaxed was Patrick in this situation that he seems to just collapse off the platform into the 200m drop.

Patrick is extremely relaxed!

Tea, who organized the event and has wanted to do it since she was knee high to a grasshopper stepped up next. Although it seemed as though there were some nerves when standing around waiting, the tone changed when she stepped up on the platform and made it look easy as she jumped off the ledge.

Tea does it with class.

In contrast, Oleksii was entirely relaxed beforehand but when he walked out onto the platform he got butterflies (which might not have been helped by everyone laughing at him). In true Oleksii style, he composed himself and delivered what was asked.

Oleksii gets some butterflies.

Sandro was something of a last minute addition to the brave troop. When he stepped up on the platform, he looked down curiously to the foot of the dam many times – we suspected his rational Swiss sensibilities were having difficulty understanding why he was jumping 200m secured by nothing but an elastic band. This was confirmed later when he declared with conviction that this was the stupidest thing he had ever done.

Sandro jumps with conviction.

A wonderful day filled with fun, emotion, emotional blackmail, great achievement, conquering fear in beautiful surroundings in beautiful weather. Big shout out to Tea for organizing.

 IMG_0162

The brave jumpers.

IMG_0163

The entire gang who went (including Kiara!)

[Thanks to Bruno for helping with the video and image content.]

Authenticating the python ceilometer client against the Openstack APIs – bloody lambda functions!

We were doing some work with Ceilometer – it appears in a few of our activities – and I was trying to get up to speed with it. Playing with the python Ceilometer client proved a little more difficult then envisaged – mostly due to deficiencies in documentation. Here’s a small note on an issue I faced with authentication.

Continue reading

Bruno Grazioli

Bruno is an ICCLab intern who found ICCLab through the IAESTE program. He is completing his  Bachelor of Computer Science at Fundação Educacional do Município de Assis (FEMA) in Assis, Brazil and is particularly excited about how technology is growing so quickly and it has  become such an integral part of our daily lives, especially how it can change our behavior in this always connected world. While studying Bruno was also working as an intern at Universidade Estadual Paulista – UNESP helping to maintain their IT systems. During his internship he will work on energy monitoring in OpenStack and in particular understanding how energy consumption within the cloud stack can be disaggregated

.IMG_20140512_150555Bruno dunking!

When not working at ICCLab Bruno is interested in playing football  – he is Brazilian, after all! – practicing sports and seeing new places and meeting new people.

Benchmarking OpenStack by using Rally – part 1

As system administrators it is difficult to gather performance data before going productive. Benchmarking tools offer a comfortable way to gather performance data by simulating usage of a productive system. In the OpenStack world we can employ the Mirantis Rally tool to benchmark VM performance of our cloud environment.

Rally comes with some predefined benchmarking tasks like e. g. booting new VMs, upstarting VMs and running shell scripts on them, concurrently building new VMs and many more. The nice drawing below shows the performance of booting VMs in an OpenStack instance in a Shewhart Control Chart (often called “X-Chart” or “X-Bar-Chart”). As you can see it takes almost 7.2 seconds to upstart a VM on average and sometimes the upstarting process is outside the usual six sigma range. For a system administrator this could be quite useful data.

A X-Chart of VM boot performance in OpenStack.

A X-Chart of VM boot performance in OpenStack.

The data above was collected employing the Rally benchmark software. The Python-based Rally tool is free, open-source and extremely easy to deploy. First you have to download Rally from this Github link.

Rally comes with an install script just clone the Github repository in a folder of your choice, cd into that folder and run:

$ ./rally/install_rally.sh

Then deploy Rally by filling your OpenStack credentials in a JSON-file:

And then type:

$ rally deployment create --filename=existing.json --name=existing
+----------+----------------------------+----------+-----------------+
|   uuid   |         created_at         |   name   |      status     |
+----------+----------------------------+----------+-----------------+
|   UUID   | 2014-04-15 11:00:28.279941 | existing | deploy-finished |
+----------+----------------------------+----------+-----------------+
Using deployment : UUID 

Remember to use the UUID you got after running the previous command.
Then type:

$ rally use deployment --deploy-id=UUID
Using deployment : UUID

Then you are ready to use Rally. Rally comes with some pre-configured test scenarios in its doc-folder. Just copy a folder like e. g. rally/doc/samples/tasks/nova/boot-and-delete.json to your favourite location like e. g. /etc/rally/mytask.json:


$ cp rally/doc/samples/tasks/nova/boot-and-delete.json /etc/rally/mytask.json

Before you can run a Rally task, you have to configure the tasks. This can be done either via JSON- or via YAML-files. The Rally API can deal with both file format types.
If you edit the JSON-file mytask.json, you see something like the following:


{
    "NovaServers.boot_and_delete_server": [
        {
            "args": {
                "flavor_id": 1,
                "image_id": "Glance UUID"
            },
            "runner": {
                "type": "constant",
                "times": 10,
                "concurrency": 2
            },
            "context": {
                "users": {
                    "tenants": 3,
                    "users_per_tenant": 2
                }
            }
        }
    ]
}

You have to add the correct UUID of a Glance image in order to configure the test run properly. The UUID can be retrieved by typing:


$ rally show images
+--------------------------------------+--------+----------+
|                 UUID                 |  Name  | Size (B) |
+--------------------------------------+--------+----------+
| d3db863b-ebff-4156-a139-5005ec34cfb7 | Cirros | 13147648 |
| d94f522f-008a-481c-9330-1baafe4933be | TestVM | 14811136 |
+--------------------------------------+--------+----------+

Update the mytask.json file with the UUID of the Glance image.

If we want to run the task simply type (the “-v” flag for “verbose” output):


$ rally -v task start /etc/rally/mytask.json

=================================================================
Task  ... is started
------------------------------------------------------------------
2014-05-12 11:54:07.060 . INFO rally.benchmark.engine [-] Task ... 
2014-05-12 11:54:07.864 . INFO rally.benchmark.engine [-] Task ... 
2014-05-12 11:54:07.864 . INFO rally.benchmark.engine [-] Task ... 
...
+--------------------+-------+---------------+---------------+
|       action       | count |   max (sec)   |   avg (sec)   |
+--------------------+-------+---------------+---------------+
|  nova.boot_server  |   10  | 8.28417992592 | 5.87529754639 | |
| nova.delete_server |   10  | 6.39436888695 | 4.54159021378 |
+--------------------+-------+---------------+---------------+

---------------+---------------+---------------+---------------+
   avg (sec)   |   min (sec)   | 90 percentile | 95 percentile |
---------------+---------------+---------------+---------------+
 5.87529754639 | 4.68817186356 | 7.33927609921 | 7.81172801256 |
 4.54159021378 | 4.31421685219 | 4.61614284515 | 5.50525586605 |
---------------+---------------+---------------+---------------+

+---------------+---------------+---------------+---------------+
|   max (sec)   |   avg (sec)   |   min (sec)   |  90 pecentile | 
+---------------+---------------+---------------+---------------+
| 13.6288781166 | 10.4170130491 | 9.01177096367 | 12.7189923525 |
+---------------+---------------+---------------+---------------+...
...

The statistical output is now of major interest: it shows how long it takes to boot a VM instance in OpenStack and gives some useful information about the performance of your current OpenStack deployment. It can be viewed as a sample in the Shewhart control chart. Rally takes 10 test runs and measures the average runtime of each run. This technique is called statistical sampling. So each Rally run can be viewed as a sample which is represented as one data point in a control chart.

But how did we get our data into a Shewhart Control chart? This will be explained further in part 2.

The Team

Link

Current Team

Alumni

Associated Members