ElasTest Passes European Commission’s Review Successfully!

On July 18th in Brussels project partners presented ElasTest results and progress to a tribunal of three independent experts appointed by the European Commission and the EC Project Officer. The key project objective is to improve the efficiency of testing large-scale complex software systems. The ElasTest project is coordinated by URJC. ZHAW’s ICCLab is a key project partner delivering research and technology in the area of service delivery, monitoring and billing. 

The objective of this review was to evaluate the project progress and to show all technical evolution and of course check on the administrative coordination of the first 18 months. For assessing the project, the three reviewers analysed all the public and private information related to the project.

We had an 8 hours evaluation meeting and we were able to show the progress made in research, innovation, demos, exploitation plans, sustainability, and coordination issues of course were  also presented. The most challenging part was to show the demonstration of the software developed by the different project partners: a one-hour session in which all the software artifacts were successfully demonstrated, including the ZHAW work. All of these efforts were welcomed by the reviewers. Finally, after an initial deliberation, the reviewers communicated their decision to approve the project and congratulated the team on a successful review!

The project is now focused on the second phase: once the initial platform has been developed is integrated and its up-and-running, most of our efforts will aim to dedicate to research and create a community of users around ElasTest.

For more information on ElasTest checkout our site and code repositories.

Experience using Kolla Ansible to upgrade Openstack from Ocata to Queens

We made a decision to use Kolla-Ansible for Openstack management approximately a year ago and we’ve just gone through the process of upgrading from Ocata to Pike to Queens. Here we provide a few notes on the experience.

By way of some context: our system is a moderate sized system with 3 storage nodes, 7 compute nodes and 3 controllers configured in HA. Our systems were running CentOS 7.5 with a 17.05.0-ce docker engine and we were using the centos-binary Kolla containers. Being an academic institution, usage of our system peaks during term time – performing the upgrade during the summer meant that system utilization was modest. As we are lucky enough to have tolerant users, we were not excessively concerned with ensuring minimal system downtime.

We had done some homework on some test systems in different configurations and had obtained some confidence with the Kolla-Ansible Ocata-Pike-Queens upgrade – we even managed to ‘upgrade’ from a set of centos containers to ubuntu containers without problem. We had also done an upgrade on a smaller, newer system which is in use and it went smoothly. However, we still had a little apprehension when performing the upgrade on the larger system.

In general, we found Kolla Ansible good and we were able to perform the upgrade without too much difficulty. However, it is not an entirely hands-off operation and it did require some intervention for which good knowledge of both Openstack and Kolla was necessary.

Our workflow was straightforward, comprising of the following three stages

  • generate the three configuration files passwords.yml, globals.yml and multinode.ha,
  • pull down all containers to the nodes using kolla-ansible pull
  • perform the upgrade using kolla-ansible upgrade.

We generated the globals.yml and passwords.yml config files by copying the empty config files from the appropriate kolla-ansible git branch to our /etc/kolla directory, comparing them with the files used in the previous deploy and copying changes from the previous versions into the new config file. We used the approach described here to generate the correct passwords.yml file.

Pulling appropriate containers to all nodes was straightforward:

/opt/kolla-ansible/tools/kolla-ansible \
    -i /etc/kolla/multinode.ha pull

It can take a bit of time, but it’s sensible as it does not have any impact on the operational system and reduces the amount of downtime when upgrading.

We were then ready to perform the deployment. Rather than run the system through the entire upgrade process, we chose a more conservative approach in which we upgraded a single service at a time: this was to maintain a little more control over the process and to enable us to check that each service was operating correctly after upgrade. We performed this using commands such as:

/opt/kolla-ansible/tools/kolla-ansible \
    -i /etc/kolla/multinode.ha --tags "haproxy" upgrade

We stepped through the services in the same order as listed in the main Kolla-Ansible playbook, deploying the services one by one.

The two services that we were most concerned about were those pertaining to data storage, naturally: mariadb and ceph. We were quite confident that the other processes should not cause significant problems as they do not retain much important state.

Before we started…

We had some initial problems with docker python libraries installed on all of our nodes. The variant of the docker python library available via standard CentOS repos is too old. We had to resort to pip to install a new docker python library which worked with newer versions of Kolla-Ansible.

Ocata-Pike Upgrade

Deploying all the services for the Ocata-Pike upgrade was straightforward: we just ran through each of the services in turn and there were no specific issues. When performing some final testing, however, the compute nodes were unable to schedule new VMs as neutron was unable to attach a VIF to the OVS bridge. We had seen this issue before and we knew that putting the compute nodes through a boot cycle solves it – not a very clean approach, but it worked.

Pike-Queens Upgrade

The Pike-Queens upgrade was more complex and we encountered issues that we had not specifically seen documented anywhere. The issues were the following:

    • the mariadb upgrade failed – when the slave instances were restarted, they did not join the mariadb cluster and we ended up with a cluster with 0 nodes in the ‘JOINED’ state. The master node also ended up in an inoperational state.
      • We solved this using the well documented approach to bootstrapping a mariadb cluster – we have our own variant of it for the kolla mariadb containers, which is essentially a replica of the mariadb_recovery functionality provided by kolla
      • This did involve a syncing process of replicating all data from the bootstrap node on each of the slave nodes; in our case, this took 10 minutes
    • when the mariadb database sync’d and reached quorum, we noticed many errors associated with record field types in the logs – for this upgrade, it was necessary to perform a mysql_upgrade, which we had not seen documented anywhere
    • the ceph upgrade process was remarkably painless, especially given that this involved a transition from Ceph Jewel to Ceph Luminous. We did have the following small issues to deal with
      • We had to modify the configuration of the ceph cluster using ceph osd require-osd-release luminous
      • We had one small issue that the cluster was in the HEALTH_WARN status as one application did not have an appropriate tag – this was easily fixed using ceph osd pool application enable {pool-name} {application-name}
      • for reasons that are not clear to us, Luminous considered the status of the cluster to be somewhat suboptimal and moved over 50% of the objects in the cluster; Jewel had given no indication that a large amount of the cluster data needed to be moved
    • Upgrading the object store rendered it unusable: in this upgrade, the user which authenticates against keystone with privilege to manage user data for the object store changed from admin to ceph_rgw. However, this user was not added to the keystone and all requests to the object store failed. Adding this user to the keystone and giving this user appropriate access to the service project fixed the issue.
      • This was due to a change that was introduced in the Ocata release after we had performed our deployment and it only became visible to use after we performed the upgrade.

Apart from those issues, everything worked fine; we did note that the nova database upgrade/migration in the Pike-Queens cycle did take quite a long time (about 10 minutes) for our small cluster – for a very large configuration, it may be necessary to monitor this more closely.

Final remarks…

The Kolla-Ansible upgrade process worked well for our modest deployment and we are happy to recommend it as an Openstack management tool for environments of such scale with quite standard configurations, although even with an advanced tool such as Kolla-Ansible, it is essential to have a good understanding of both Openstack and Kolla before depending on it in a production system.

Setting up container based Openstack with OVN networking

OVN is a relatively new networking technology which provides a powerful and flexible software implementation of standard networking functionalities such as switches, routers, firewalls, etc. Importantly, OVN is distributed in the sense that the aforementioned network entities can be realized over a distributed set of compute/networking resources. OVN is tightly coupled with OVS, essentially being a layer of abstraction which sits above a set of OVS switches and realizes the above networking components across these switches in a distributed manner.

A number of cloud computing platforms and more general compute resource management frameworks are working on OVN support, including oVirt, Openstack, Kubernetes and Openshift – progress on this front is quite advanced. Interestingly and importantly, one dimension of the OVN vision is that it can act as a common networking substrate which could facilitate integration of more than one of the above systems, although the realization of that vision remains future work.

In the context of our work on developing an edge computing testbed, we set up a modest Openstack cluster, to emulate functionality deployed within an Enterprise Data Centre with OVN providing network capabilities to the cluster. This blog post provides a brief overview of the system architecture and notes some issues we had getting it up and running.

As our system is not a production system, providing High Availability (HA) support was not one of the requirements; consequently, it was not necessary to consider HA OVN mode. As such, it was natural to host the OVN control services, including the Northbound and Southbound DBs and the Northbound daemon (ovn-northd) on the Openstack controller node. As this is the node through which external traffic goes, we also needed to run an external facing OVS on this node which required its own OVN controller and local OVS database. Further, as this OVS chassis is intended for external traffic, it needed to be configured with ‘enable-chassis-as-gw‘.

We configured our system to use DHCP provided by OVN; consequently the neutron DHCP agent was no longer necessary, we removed this process from our controller node. Similarly, L3 routing was done within OVN meaning that the neutron L3 agent was no longer necessary. Openstack metadata support is implemented differently when OVN is used: instead of having a single metadata process running on a controller serving all metadata requests, the metadata service is deployed on each node and the OVS switch on each node routes requests to 169.254.169.254 to the local metadata agent; this then queries the nova metadata service to obtain the metadata for the specific VM.

The services deployed on the controller and compute nodes are shown in Figure 1 below.

Figure 1: Neutron containers with and without OVN

We used Kolla to deploy the system. Kolla does not currently have full support for OVN; however specific Kolla containers for OVN have been created (e.g. kolla/ubuntu-binary-ovn-controller:queens, kolla/ubuntu-binary-neutron-server-ovn:queens). Hence, we used an approach which augments the standard Kolla-ansible deployment with manual configuration of the extra containers necessary to get the system running on OVN.

As always, many smaller issues were encountered while getting the system working – we will not detail all these issues here, but rather focus on the more substantive issues. We divide these into three specific categories: OVN parameters which need to be configured, configuration specifics for the Kolla OVN containers and finally a point which arose due to assumptions made within Kolla that do not necessarily hold for OVN.

To enable OVN, it was necessary to modify the configuration of the OVS switches operating on all the nodes; the existing OVS containers and OVSDB could be used for this – the OVS version shipped with Kolla/Queens is v2.9.0 – but it was necessary to modify some settings. First, it was necessary to configure system-ids for all of the OVS chassis’ – we chose to select fixed UUIDs a priori and use these for each deployment such that we had a more systematic process for setting up the system but it’s possible to use a randomly generated UUID.

docker exec -ti openvswitch_vswitchd ovs-vsctl set open_vswitch . external-ids:system-id="$SYSTEM_ID"

On the controller node, it was also necessary to set the following parameters:

docker exec -ti openvswitch_vswitchd ovs-vsctl set Open_vSwitch . \
    external_ids:ovn-remote="tcp:$HOST_IP:6642" \
    external_ids:ovn-nb="tcp:$HOST_IP:6641" \
    external_ids:ovn-encap-ip=$HOST_IP external_ids:ovn-encap type="geneve" \
    external-ids:ovn-cms-options="enable-chassis-as-gw"

docker exec openvswitch_vswitchd ovs-vsctl set open . external-ids:ovn-bridge-mappings=physnet1:br-ex

On the compute nodes this was necessary:

docker exec -ti openvswitch_vswitchd ovs-vsctl set Open_vSwitch . \
    external_ids:ovn-remote="tcp:$OVN_SB_HOST_IP:6642" \
    external_ids:ovn-nb="tcp:$OVN_NB_HOST_IP:6641" \
    external_ids:ovn-encap-ip=$HOST_IP \
    external_ids:ovn-encap-type="geneve"

Having changed the OVS configuration on all the nodes, it was then necessary to get the services operational on the nodes. There are two specific aspects to this: modifying the service configuration files as necessary and starting the new services in the correct way.

Not many changes to the service configurations were required. The primary changes related to ensuring the the OVN mechanism driver was used and letting neutron know how to communicate with OVN. We also used the geneve tunnelling protocol in our deployment and this required the following configuration settings:

  • For the neutron server OVN container
    • ml2_conf.ini
              mechanism_drivers = ovn
       	type_drivers = local,flat,vlan,geneve
       	tenant_network_types = geneve
      
       	[ml2_type_geneve]
       	vni_ranges = 1:65536
       	max_header_size = 38
      
       	[ovn]
       	ovn_nb_connection = tcp:172.30.0.101:6641
       	ovn_sb_connection = tcp:172.30.0.101:6642
       	ovn_l3_scheduler = leastloaded
       	ovn_metadata_enabled = true
      
    • neutron.conf
              core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
       	service_plugins = networking_ovn.l3.l3_ovn.OVNL3RouterPlugin
      
  • For the metadata agent container (running on the compute nodes) it was necessary to configure it to point at the nova metadata service with the appropriate shared key as well as how to communicate with OVS running on each of the compute nodes
            nova_metadata_host = 172.30.0.101
     	metadata_proxy_shared_secret = <SECRET>
     	bridge_mappings = physnet1:br-ex
     	datapath_type = system
     	ovsdb_connection = tcp:127.0.0.1:6640
     	local_ip = 172.30.0.101
    

For the OVN specific containers – ovn-northd, ovn-sb and ovn-nb databases, it was necessary to ensure that they had the correct configuration at startup; specifically, that they knew how to communicate with the relevant dbs. Hence, start commands such as

/usr/sbin/ovsdb-server /var/lib/openvswitch/ovnnb.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/ovnnb_db.sock --remote=ptcp:$ovnnb_port:$ovsdb_ip --unixctl=/run/openvswitch/ovnnb_db.ctl --log-file=/var/log/kolla/openvswitch/ovsdb-server-nb.log

were necessary (for the ovn northbound database) and we had to modify the container start process accordingly.

It was also necessary to update the neutron database to support OVN specific versioning information: this was straightforward using the following command:

docker exec -ti neutron-server-ovn_neutron_server_ovn_1 neutron-db-manage upgrade heads

The last issue which we had to overcome was that Kolla and neutron OVN had slightly different views regarding the naming of the external bridges. Kolla-ansible configured a connection between the br-ex and br-int OVS bridges on the controller node with port names phy-br-ex and int-br-ex respectively. OVN also created ports with the same purpose but with different names patch-provnet-<UUID>-to-br-int and patch-br-int-to-provonet-<UUID>; as these ports had the same purpose, our somewhat hacky solution was to manually remove the the ports created in the first instance by Kolla-ansible.

Having overcome all these steps, it was possible to launch a VM which had external network connectivity and to which a floating IP address could be assigned.

Clearly, this approach is not realistic for supporting a production environment, but it’s an appropriate level of hackery for a testbed.

Other noteworthy issues which arose during this work include the following:

  • Standard docker apparmor configuration in ubuntu is such that mount cannot be run inside containers, even if they have the appropriate privileges. This has to be disabled or else it is necessary to ensure that the containers do not use the default docker apparmor profile.
  • A specific issue with mounts inside a container which resulted in the mount table filling up with 65536 mounts and rendering the host quite unusable (thanks to Stefan for providing a bit more detail on this) – the workaround was to ensure that /run/netns was bind mounted into the container.
  • As we used geneve encapsulation, geneve kernel modules had to be loaded
  • Full datapath NAT support is only available for linux kernel 4.6 and up. We had to upgrade the 4.4 kernel which came with our standard ubuntu 16.04 environment.

This is certainly not a complete guide to how to get Openstack up and running with OVN, but may be useful to some folks who are toying with this. In future, we’re going to experiment with extending OVN to an edge networking context and will provide more details as this work evolves.

 

Brief report on the ICDCS’18 conference

The 38th IEEE International Conference on Distributed Computing Systems (ICDCS’18) took place from July 2 – 5, 2018, in Vienna, Austria. This blog post briefly summarises from our view as participating researchers from the Service Prototyping Lab some key aspects on distributed applications and general take-away inspirations of the well-established conference.

Continue reading

Open Cloud Day 2018

This year we had the pleasure to organize and host one of Switzerland’s most prestigious cloud events, the OpenCloudDay. On the 30th of May, we welcomed the around 80 participants at the ZHAW School of Engineering in Winterthur for a day rich with technical talks, demos and networking possibilities for Cloud Computing practitioners and experts in Switzerland.

Welcome and introduction to Open Cloud Day 2018

The program of the day started with two opening talks covering very timely topics in the field of Cloud Computing. The first talk, given by Thomas Michael Bohnert from the ICCLab, was a critical view on what many consider as the next evolutionary direction of Cloud Computing, namely Edge Computing. We got the speaker’s perspective on the motivations, the potential obstacles and open issues for this paradigm to definitely break through (or maybe not) as the next Cloud Computing frontier. The second opening talk was given by Sacha Dubois from Red Hat and focused on the potential of Ansible Tower for the automation and management of Hybrid Clouds. After a general discussion on the possibilities offered by Ansible Tower to managing both on-premise and public cloud workloads, a live demo showed how this would work in the practice.

Presentation on Ansible Tower by Red Hat

During the second part of the morning and the first part of the afternoon, two technical sessions were ran in parallel. Several topics were covered as for instance Continuous Delivery, Continuous Deployment and Continuous Integration in the Cloud, and the CNCF activities during the last year, the challenges with the adoption of Web Application Firewall for the DevOps methodology and much more. An insightful presentation was given on the current cloudware technologies and what to expect from future post-clouds systems. Practical experiences were also presented as, for instance, in setting up a Kubernetes cluster, on the use of Ansible for cloud solutions. Also a workshop about the setup of an  oVirt infrastructure for an open source Cloud Management Software was organized in the morning. For a complete program of the technical talks please visit the webpage of the OpenCloudDay.

Attendees during one of the technical presentations

The two final technical talks of the day were given by Niklaus Hofer from Stepping Stone and Jens-Christian Fischer from SWITCH. In the first of the two talks, a presentation was given on the analysis of storage performance for a Ceph cluster. More specifically, the focus was on the comparison between the new backend solution for the Luminous Ceph release, i.e. BlueStore, and the FileStore solution for storing data to disk. Open challenges and further open points of investigation were also given.
The last talk brought up a different point of view regarding all the technical solutions to run a cloud. Based on the experience of SWITCH in running an OpenStack/Ceph based cloud for the Swiss Academic community, the importance of the users’ role in using the technology was put in focus. The user’s perspective is not to be overseen as this puts additional challenges and requirements for solutions to be deployed as the experience of SWITCH clearly highlighted.

The program of the day also offered a total of seven demo presentations on the following topics: Cloud Robotics, Edge Computing, CAB, CNA, Service Tooling, ElasTest, T-Systems solutions.

One of the demos presented by the ICCLab

The role of FaaS in mixed-technology cloud and scientific computing applications

The computer science department of AGH University of Science and Technology in Kraków has produced substantial analytical research contributions to assess the suitability of cloud functions as a basis for scientific workflows and computing platforms. Therefore, representing our similar research interests in the Service Prototyping Lab at Zurich University of Applied Sciences, we arranged an intensive two-day exchange including a research seminar, some live experiments and many inspiring discussions. This blog post summarises the talks and experimental results and provides an overview about evident trends and possibilities for future research in this area.

Continue reading

Storage & Data Analytics – Swiss 2018

On the 24th of May we attended the “Storage & Data Analytics – Swiss 2018” day which was organized at the Seedamm Plaza in Pfäffikon SZ.
Our interest and expertise at the ICCLab for innovative solutions in the area of Cloud Storage motivated us to join the event with the aim to exchange expertise with colleagues from both the industrial and the academic realms.

Welcome and introduction to the day

The program for the event offered a well-balanced mix of keynote speeches from top-experts in the field of storage and data analytics, presentations from specialists and companies actively working in the continuously evolving market, workshops, round-tables, and live demos on specific aspects of interest, and important moments for networking and knowledge exchange with the participants.
Besides the keynotes, the program was organized with four sessions running in parallel. The high number of persons attending the sessions and the stands proposed by the industrial partners for the event witnesses the high interest in the topics in focus. Five major areas of interest were covered: Data Management, Data Analytics, Cloud Storage, Technology and Security. You can find the complete program at the following link https://www.storage-day.ch/

Harald Seipp (IBM) presents Storage in Container-based Cloud Infrastructure

The research and development interests at the ICCLab naturally attracted our interest towards presentations in the area of Cloud Storage and Technology. The first Keynote of the day by Prof. Brinkmann from the University of Mainz, guided us through a classification of Storage with a view on the future of Storage. In the subsequent presentation by IBM, Storage in container-based Cloud Infrastructures was discussed underlying the importance of persistant storage and multi-cloud environments. Of particular interest to us was the presentation given by the company SUSE. Software Defined Storage was discussed as the de-facto Standard for storage in the Cloud, highlighting also the importance of open source based solutions when they presented their Enterprise Storage solution based on Openstack and Ceph. A further interesting analysis on Cloud Storage was later presented by the company Nutanix which introduced their full-stack solution for Storage in the Cloud.

As an icing on the cake, the day was concluded by the insightful keynote given by Moshe Rappaport, Executive Technologist at IBM Research, which guided the audience in the future shedding light on the new disruptive technologies being ahead of us. The future of Storage was also predicted as this is rapidly evolving towards high density data storage applications requiring innovative research and development solutions.

Moshe Rappaport’s insightful keynote on the future of Business and IT from an IBM research perspective

In conclusion, our participation to the “Storage & Data Analytics – Swiss 2018” was well worth the time investment. The event has clearly fulfilled the expectations as an important source of inspiration for our research activities and as an opportunity for networking with experts in the field. We are already looking forward to the next event of this kind!

Call for Contributions: IEEE/ACM UCC and BDCAT 2018, Zurich, Switzerland

Block the dates in your calendar: December 17 to 21 is high cloud time in Switzerland!

Two computer science research laboratories at Zurich University of Applied Sciences, the Service Prototyping Lab and the ICCLab, are jointly going to host the 11th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2018) and the 5th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT 2018) along with a number of satellite and co-located events from December 17 to 21 in Zurich, Switzerland. This pre-christmas conference week with prestigious conferences is a unique opportunity to bring together international researchers and practitioners in central Europe. Please consider supporting the event with corporate donations, tutorials, cloud challenge entries and other contributions. Your chance to demonstrate convincing cloud technology to the world! Contact the conference organisers for any details.

Technical paper submissions are furthermore open to a number of collocated workshops. Among them we would like to point out the 1st Workshop on Quality Assurance in the Context of Cloud Computing (QA3C 2018) and the 1st Workshop on Cloud-Native Applications Design and Experience (CNAX 2018) in which our research staff proudly serves as co-chairs. In total, 9 workshops are accepting papers now, a doctoral forum accepts research proposals, and a cloud challenge supports practical (demo-able) contributions with emphasis on reproducible impactful results.

Finally, we would like to mention specifically the subsequent European Symposium on Serverless Computing and Applications (ESSCA 2018) on December 21st which as a mixed industry-academic-community event acknowledges that FaaS-based applications have become mainstream but challenges remain. Got a talk on that topic? Just propose it informally to enrich the technical meeting with different perspectives. Along with ESSCA, on December 20 there will be the 4th edition of the International Workshop on Serverless Computing as part of UCC.

KubeCon’18 – Cloud, containers, edge, nets, robots, and philosophy of science

KubeCon / CloudNativeCon Europe 2018 took place at the shiny Bella Center of Copenhagen on May 2 – 4, 2018.
Here at ICCLab/SPLab we use extensively Kubernetes / CNCF technologies both in teaching and research, but we had one extra reason for being there this year: our friends and colleagues from Rapyuta Robotics (RR) were scheduled to give a talk on Cloud Robotics PaaS.

Bella Center - Copenhagen

Bella Center – Copenhagen

Continue reading

SPLab Colloquium on Serverless Continuum

The third invited talk in our colloquium series in 2018 was given by Martin Garriga, at that time finishing his time as post-doctoral fellow at Politecnico di Milano’s Deep SE group, and now continuing as lecturer at the Informatics Faculty at National University of Comahue (UNComa) in Patagonia, Argentina. Martin, like several people at the Service Prototyping Lab, has been interested for quite some time in serverless computing, as evidenced by his ESOCC 2017 article on empowering low-latency applications with OpenWhisk and related tools (see details). In his colloquium talk, entitled «Towards the Serverless Continuum», he reflected on this work and proposed a wider view on a spectrum from mobile applications over edge nodes to, eventually, powerful cloud platforms.

Continue reading