Setting up container based Openstack with OVN networking

OVN is a relatively new networking technology which provides a powerful and flexible software implementation of standard networking functionalities such as switches, routers, firewalls, etc. Importantly, OVN is distributed in the sense that the aforementioned network entities can be realized over a distributed set of compute/networking resources. OVN is tightly coupled with OVS, essentially being a layer of abstraction which sits above a set of OVS switches and realizes the above networking components across these switches in a distributed manner.

A number of cloud computing platforms and more general compute resource management frameworks are working on OVN support, including oVirt, Openstack, Kubernetes and Openshift – progress on this front is quite advanced. Interestingly and importantly, one dimension of the OVN vision is that it can act as a common networking substrate which could facilitate integration of more than one of the above systems, although the realization of that vision remains future work.

In the context of our work on developing an edge computing testbed, we set up a modest Openstack cluster, to emulate functionality deployed within an Enterprise Data Centre with OVN providing network capabilities to the cluster. This blog post provides a brief overview of the system architecture and notes some issues we had getting it up and running.

As our system is not a production system, providing High Availability (HA) support was not one of the requirements; consequently, it was not necessary to consider HA OVN mode. As such, it was natural to host the OVN control services, including the Northbound and Southbound DBs and the Northbound daemon (ovn-northd) on the Openstack controller node. As this is the node through which external traffic goes, we also needed to run an external facing OVS on this node which required its own OVN controller and local OVS database. Further, as this OVS chassis is intended for external traffic, it needed to be configured with ‘enable-chassis-as-gw‘.

We configured our system to use DHCP provided by OVN; consequently the neutron DHCP agent was no longer necessary, we removed this process from our controller node. Similarly, L3 routing was done within OVN meaning that the neutron L3 agent was no longer necessary. Openstack metadata support is implemented differently when OVN is used: instead of having a single metadata process running on a controller serving all metadata requests, the metadata service is deployed on each node and the OVS switch on each node routes requests to 169.254.169.254 to the local metadata agent; this then queries the nova metadata service to obtain the metadata for the specific VM.

The services deployed on the controller and compute nodes are shown in Figure 1 below.

Figure 1: Neutron containers with and without OVN

We used Kolla to deploy the system. Kolla does not currently have full support for OVN; however specific Kolla containers for OVN have been created (e.g. kolla/ubuntu-binary-ovn-controller:queens, kolla/ubuntu-binary-neutron-server-ovn:queens). Hence, we used an approach which augments the standard Kolla-ansible deployment with manual configuration of the extra containers necessary to get the system running on OVN.

As always, many smaller issues were encountered while getting the system working – we will not detail all these issues here, but rather focus on the more substantive issues. We divide these into three specific categories: OVN parameters which need to be configured, configuration specifics for the Kolla OVN containers and finally a point which arose due to assumptions made within Kolla that do not necessarily hold for OVN.

To enable OVN, it was necessary to modify the configuration of the OVS switches operating on all the nodes; the existing OVS containers and OVSDB could be used for this – the OVS version shipped with Kolla/Queens is v2.9.0 – but it was necessary to modify some settings. First, it was necessary to configure system-ids for all of the OVS chassis’ – we chose to select fixed UUIDs a priori and use these for each deployment such that we had a more systematic process for setting up the system but it’s possible to use a randomly generated UUID.

docker exec -ti openvswitch_vswitchd ovs-vsctl set open_vswitch . external-ids:system-id="$SYSTEM_ID"

On the controller node, it was also necessary to set the following parameters:

docker exec -ti openvswitch_vswitchd ovs-vsctl set Open_vSwitch . \
    external_ids:ovn-remote="tcp:$HOST_IP:6642" \
    external_ids:ovn-nb="tcp:$HOST_IP:6641" \
    external_ids:ovn-encap-ip=$HOST_IP external_ids:ovn-encap type="geneve" \
    external-ids:ovn-cms-options="enable-chassis-as-gw"

docker exec openvswitch_vswitchd ovs-vsctl set open . external-ids:ovn-bridge-mappings=physnet1:br-ex

On the compute nodes this was necessary:

docker exec -ti openvswitch_vswitchd ovs-vsctl set Open_vSwitch . \
    external_ids:ovn-remote="tcp:$OVN_SB_HOST_IP:6642" \
    external_ids:ovn-nb="tcp:$OVN_NB_HOST_IP:6641" \
    external_ids:ovn-encap-ip=$HOST_IP \
    external_ids:ovn-encap-type="geneve"

Having changed the OVS configuration on all the nodes, it was then necessary to get the services operational on the nodes. There are two specific aspects to this: modifying the service configuration files as necessary and starting the new services in the correct way.

Not many changes to the service configurations were required. The primary changes related to ensuring the the OVN mechanism driver was used and letting neutron know how to communicate with OVN. We also used the geneve tunnelling protocol in our deployment and this required the following configuration settings:

  • For the neutron server OVN container
    • ml2_conf.ini
              mechanism_drivers = ovn
       	type_drivers = local,flat,vlan,geneve
       	tenant_network_types = geneve
      
       	[ml2_type_geneve]
       	vni_ranges = 1:65536
       	max_header_size = 38
      
       	[ovn]
       	ovn_nb_connection = tcp:172.30.0.101:6641
       	ovn_sb_connection = tcp:172.30.0.101:6642
       	ovn_l3_scheduler = leastloaded
       	ovn_metadata_enabled = true
      
    • neutron.conf
              core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
       	service_plugins = networking_ovn.l3.l3_ovn.OVNL3RouterPlugin
      
  • For the metadata agent container (running on the compute nodes) it was necessary to configure it to point at the nova metadata service with the appropriate shared key as well as how to communicate with OVS running on each of the compute nodes
            nova_metadata_host = 172.30.0.101
     	metadata_proxy_shared_secret = <SECRET>
     	bridge_mappings = physnet1:br-ex
     	datapath_type = system
     	ovsdb_connection = tcp:127.0.0.1:6640
     	local_ip = 172.30.0.101
    

For the OVN specific containers – ovn-northd, ovn-sb and ovn-nb databases, it was necessary to ensure that they had the correct configuration at startup; specifically, that they knew how to communicate with the relevant dbs. Hence, start commands such as

/usr/sbin/ovsdb-server /var/lib/openvswitch/ovnnb.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/ovnnb_db.sock --remote=ptcp:$ovnnb_port:$ovsdb_ip --unixctl=/run/openvswitch/ovnnb_db.ctl --log-file=/var/log/kolla/openvswitch/ovsdb-server-nb.log

were necessary (for the ovn northbound database) and we had to modify the container start process accordingly.

It was also necessary to update the neutron database to support OVN specific versioning information: this was straightforward using the following command:

docker exec -ti neutron-server-ovn_neutron_server_ovn_1 neutron-db-manage upgrade heads

The last issue which we had to overcome was that Kolla and neutron OVN had slightly different views regarding the naming of the external bridges. Kolla-ansible configured a connection between the br-ex and br-int OVS bridges on the controller node with port names phy-br-ex and int-br-ex respectively. OVN also created ports with the same purpose but with different names patch-provnet-<UUID>-to-br-int and patch-br-int-to-provonet-<UUID>; as these ports had the same purpose, our somewhat hacky solution was to manually remove the the ports created in the first instance by Kolla-ansible.

Having overcome all these steps, it was possible to launch a VM which had external network connectivity and to which a floating IP address could be assigned.

Clearly, this approach is not realistic for supporting a production environment, but it’s an appropriate level of hackery for a testbed.

Other noteworthy issues which arose during this work include the following:

  • Standard docker apparmor configuration in ubuntu is such that mount cannot be run inside containers, even if they have the appropriate privileges. This has to be disabled or else it is necessary to ensure that the containers do not use the default docker apparmor profile.
  • A specific issue with mounts inside a container which resulted in the mount table filling up with 65536 mounts and rendering the host quite unusable (thanks to Stefan for providing a bit more detail on this) – the workaround was to ensure that /run/netns was bind mounted into the container.
  • As we used geneve encapsulation, geneve kernel modules had to be loaded
  • Full datapath NAT support is only available for linux kernel 4.6 and up. We had to upgrade the 4.4 kernel which came with our standard ubuntu 16.04 environment.

This is certainly not a complete guide to how to get Openstack up and running with OVN, but may be useful to some folks who are toying with this. In future, we’re going to experiment with extending OVN to an edge networking context and will provide more details as this work evolves.

 

Brief report on the ICDCS’18 conference

The 38th IEEE International Conference on Distributed Computing Systems (ICDCS’18) took place from July 2 – 5, 2018, in Vienna, Austria. This blog post briefly summarises from our view as participating researchers from the Service Prototyping Lab some key aspects on distributed applications and general take-away inspirations of the well-established conference.

Continue reading

The role of FaaS in mixed-technology cloud and scientific computing applications

The computer science department of AGH University of Science and Technology in Kraków has produced substantial analytical research contributions to assess the suitability of cloud functions as a basis for scientific workflows and computing platforms. Therefore, representing our similar research interests in the Service Prototyping Lab at Zurich University of Applied Sciences, we arranged an intensive two-day exchange including a research seminar, some live experiments and many inspiring discussions. This blog post summarises the talks and experimental results and provides an overview about evident trends and possibilities for future research in this area.

Continue reading

Call for Contributions: IEEE/ACM UCC and BDCAT 2018, Zurich, Switzerland

Block the dates in your calendar: December 17 to 21 is high cloud time in Switzerland!

Two computer science research laboratories at Zurich University of Applied Sciences, the Service Prototyping Lab and the ICCLab, are jointly going to host the 11th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2018) and the 5th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT 2018) along with a number of satellite and co-located events from December 17 to 21 in Zurich, Switzerland. This pre-christmas conference week with prestigious conferences is a unique opportunity to bring together international researchers and practitioners in central Europe. Please consider supporting the event with corporate donations, tutorials, cloud challenge entries and other contributions. Your chance to demonstrate convincing cloud technology to the world! Contact the conference organisers for any details.

Technical paper submissions are furthermore open to a number of collocated workshops. Among them we would like to point out the 1st Workshop on Quality Assurance in the Context of Cloud Computing (QA3C 2018) and the 1st Workshop on Cloud-Native Applications Design and Experience (CNAX 2018) in which our research staff proudly serves as co-chairs. In total, 9 workshops are accepting papers now, a doctoral forum accepts research proposals, and a cloud challenge supports practical (demo-able) contributions with emphasis on reproducible impactful results.

Finally, we would like to mention specifically the subsequent European Symposium on Serverless Computing and Applications (ESSCA 2018) on December 21st which as a mixed industry-academic-community event acknowledges that FaaS-based applications have become mainstream but challenges remain. Got a talk on that topic? Just propose it informally to enrich the technical meeting with different perspectives. Along with ESSCA, on December 20 there will be the 4th edition of the International Workshop on Serverless Computing as part of UCC.

SPLab Colloquium on Serverless Continuum

The third invited talk in our colloquium series in 2018 was given by Martin Garriga, at that time finishing his time as post-doctoral fellow at Politecnico di Milano’s Deep SE group, and now continuing as lecturer at the Informatics Faculty at National University of Comahue (UNComa) in Patagonia, Argentina. Martin, like several people at the Service Prototyping Lab, has been interested for quite some time in serverless computing, as evidenced by his ESOCC 2017 article on empowering low-latency applications with OpenWhisk and related tools (see details). In his colloquium talk, entitled «Towards the Serverless Continuum», he reflected on this work and proposed a wider view on a spectrum from mobile applications over edge nodes to, eventually, powerful cloud platforms.

Continue reading

CLOSER’18 conference report

In the European services and cloud computing research community, the International Conference on Cloud Computing and Services Science (CLOSER) has been a meeting point for academics and applied researchers for almost a decade. This year, CLOSER 2018 took place in Santa Cruz at the Portuguese island of Madeira. As for any commercially organised conference series, there are certain expectations for how well the conference is run, and there is a lot to learn for us to drive community-organised conferences and to sense the participation in cloud conferences in general. On the technical side, we presented an international collaboration work at this conference, and we dived into the respective works of others. This blog post reports about our interpretation of both the organisational and technical aspects of CLOSER 2018.

Continue reading

SPLab Colloquium on Robust Modern API Design

In the second invited talk in our colloquium series in 2018, Alan Sill from Texas Tech University’s Cloud and Autonomic Computing Center shared his views on how to manage data centres the right way. In the talk «Topics in robust modern API design for data center control and scientific applications», many issues were pointed out whose proper solution will effect the whole cloud stack up to the way cloud-native applications are designed and equipped with deep self-management capabilities. Both the talk and the mixed-in debates are captured by this blog post.

Continue reading

Impressions from Swiss Python Summit 2018

The third Swiss Python Summit took place in Rapperswil, Switzerland, today. Conveniently located about an hour drive from the Service Prototyping Lab at Zurich University of Applied Sciences, the event reserved a spot on our conferences shortlist this year. In this post, we will briefly summarise major impressions from the well-organised summit.

Continue reading

SPLab Colloquium on Serverless Scientific Computing

Maciej Malawski from AGH University of Science and Technology in Kraków, Poland, visited us today to give a colloquium talk titled «Can we use Serverless Architectures and Highly-elastic Cloud Infrastructures for Scientific Computing?» and to discuss research around the wider topics of workflows and cloud function compositions. This post summarises the talk and the subsequent discussion mixed with further general reflections on the state of serverless applications.

Continue reading

Visitors from Itaipu Technology Park

Following up on the previous visit from Service Prototyping Lab (SPLab) at Zurich University of Applied Sciences in Switzerland to Itaipu Technology Park (PTI) in Paraguay, two young investigators from PTI’s Centre of Information and Communication Technology (CTIC) with support from CONACYT are now visiting us at the SPLab and more generally in Switzerland.

Yessica Bogado and Walter Benítez will spend some weeks to get to know the local research and development situation, get information about our ongoing research initiatives, dive into solving some hard questions, and discuss ideas for future collaboration. Furthermore, they will explore novel research methods and prototypes specifically in emerging technology areas such as cloud-native applications and serverless applications, as well as upcoming hybrid container/cloud function applications.

Walter Benítez, Yessica Bogado and the host Josef Spillner

Continue reading