Distributed Computing in the Cloud


The widespread adoption and the development of cloud platforms have increased confidence in migrating key business applications to the cloud. New approaches to distributed computing and data analysis have also emerged in conjunction with the growth of cloud computing. Among them, MapReduce and its implementations are probably the most popular and commonly used for data processing on clouds.

Efficient support for distributed computing on cloud platforms means guaranteeing high speed and ultra-low latency to enable massive amounts of uninterrupted data ingestion and real-time analysis, as well as cost-efficiency-at-scale.

Problem Statement

Currently, there are limited offerings of on demand distributed computing tools. The main challenge that applies not only to cloud environments, is to build such a framework that handles both big data and fast data. This means that the framework must be able to provide for both batch and stream processing, while allowing clients to transparently define their computations and query the results in real time. Provisioning such a framework on cloud platforms requires delivering rapid provisioning and maximal performance. Challenges also come from one of cloud’s most appealing features: elasticity and auto-scaling. Distributed computing frameworks can greatly benefit from auto-scaling, but current solutions do not support it yet.

Articles and Info

Contact Point

Piyush Harsh

Balazs Meszaros

Parallel OpenStack Multi Hosts Deployments with Foreman and Puppet

In our lab we have the need to have one environment which is running OpenStack Essex and another which is running OpenStack Folsom. Here’s a guide on how we setup our infrastructure so we can support the two environments in parallel.

To install Essex using Puppet/Foreman please follow the guides:

  • [OpenStack Puppet Part1](http://www.cloudcomp.ch/2012/07/puppet-and-openstack-part-one/),
  • [OpenStack Puppet Part2](http://www.cloudcomp.ch/2012/07/puppet-and-openstack-part-two/),
  • [OpenStack Puppet/Foreman](http://www.cloudcomp.ch/2012/07/foreman-puppet-and-openstack/)

Here it is only described how to integrate OpenStack Foslom with Puppet/Foreman. It is assumed that Puppet and Foreman are already set up according to the articles mentioned above.

2 environments will be created: `stable` and `research`. In the stable environment  are the puppet classes for Essex and in the research environment  the Folsom classes.
Create following directories:

[gist id=4147331]

Add the research and stable module path to /etc/puppet/puppet.conf

[gist id=4147341]

Clone Folsom classes:
[gist id=4147352]

Add compute.pp controller.pp, all-in-one.pp, params.pp
[gist id=4292299]

While applying controller.pp classes I encountered following error:
[gist id=4147369]

This issue is desribed [here](https://github.com/puppetlabs/puppetlabs-horizon/pull/26).

To overcome these issues add `include apache` in:
[gist id=4147377]

According to a [previous article](http://www.cloudcomp.ch/2012/07/foreman-puppet-and-openstack/) describing an issue with multiple environments, executing these steps is required:
[gist id=4147408]

After that in Foreman you can create new hostgroups and import the newly added classes (More – Puppet Classes – Import form local smart proxy).
Define stable and research environment and 3 hostgroups in the research environment: os-worker, os-controller, ow-aio.

Next assign the icclab::compute and icclab::params class to the worker hostgroup, icclab::controller and icclab::params class to the controller hostgroup and icclab::aio and icclab::params to the aio hostgroup.

Since we are using Ubuntu 12.04 it is required to add the Folsom repository to your installation. In order to do that create a new provisioning template. Copy the existing one and add line 14-18.
Name: Preseed Default Finish (Research)
Kind: finish
[gist id=4292436]

Please also consider the interface settings in line 1-7. Without these setting it was not possible to ping nor ssh VMs running on different physical nodes. This hint was found [here](http://www.mirantis.com/blog/openstack-networking-single-host-flatdhcpmanager/#network-configuration)


After that click on Association, select Ubuntu 12.04 and assign the research hostgroup and environment.

In our installation we got this error in the VM console log:

[gist id=4292393]

In our case it was due to wrongly configured iptables by open stack.
Adding the parameters metadata_host and routing_source_ip to nova.conf on the nova-network nodes has solved the issue. To make this permanent with puppet add Line 4, 34 and 35 in `/etc/puppet/modules/research/nova/manifests/compute.pp`:

[gist id=4292497]

With these steps followed you should then be able to go about provisioning your physical hosts across both puppet environments. In the next article we’ll show how we’ve segmented our network and what will be the next steps in progressing our network architecture.



ICCLab Infrastructure Relocation

The relocation of the ICCLab hardware and the integration of 9 additional nodes is now complete. The whole movement was done within one day thanks to the support of Pietro, Philipp and Michael – Thanks Guys! Now our lab runs 15 compute nodes, 1 controller node and 1 NAS. We will segment this infrastructure to build a development environment including 10 nodes where we can develop and test our work on OpenStack and a production environment including 5 nodes for production purposes. As the next step we are will to redeploy OpenStack by means of automation tools Puppet and Foreman as was presented at the EGI Technical Forum. Let’s see how fast we can deploy 15 nodes from scratch! We’ll be studying, timing and evaluating it!

Fabrice Manhart

Fabrice Manhart is a research assistant in Zurich University of Applied Science working for the Institute of Applied Information Technology. He is new in the Service Engineering area and very interested to discover new fields. Currently he is involved in the area of Infrastructure as a Service and has been instrumental in deploying OpenStack.

Fabrice finished his degree in Telecommunication and Computer Science in 2006 and started working as Project Engineer at Nexus Telecom AG, Zurich. There he was involved in many customer projects. Later he took over the Group Manager position and was responsible for the deployment.

He currently pursuing his Master of Science in Engineering degree at ZHAW.