Setting up container based Openstack with OVN networking

OVN is a relatively new networking technology which provides a powerful and flexible software implementation of standard networking functionalities such as switches, routers, firewalls, etc. Importantly, OVN is distributed in the sense that the aforementioned network entities can be realized over a distributed set of compute/networking resources. OVN is tightly coupled with OVS, essentially being a layer of abstraction which sits above a set of OVS switches and realizes the above networking components across these switches in a distributed manner.

A number of cloud computing platforms and more general compute resource management frameworks are working on OVN support, including oVirt, Openstack, Kubernetes and Openshift – progress on this front is quite advanced. Interestingly and importantly, one dimension of the OVN vision is that it can act as a common networking substrate which could facilitate integration of more than one of the above systems, although the realization of that vision remains future work.

In the context of our work on developing an edge computing testbed, we set up a modest Openstack cluster, to emulate functionality deployed within an Enterprise Data Centre with OVN providing network capabilities to the cluster. This blog post provides a brief overview of the system architecture and notes some issues we had getting it up and running.

As our system is not a production system, providing High Availability (HA) support was not one of the requirements; consequently, it was not necessary to consider HA OVN mode. As such, it was natural to host the OVN control services, including the Northbound and Southbound DBs and the Northbound daemon (ovn-northd) on the Openstack controller node. As this is the node through which external traffic goes, we also needed to run an external facing OVS on this node which required its own OVN controller and local OVS database. Further, as this OVS chassis is intended for external traffic, it needed to be configured with ‘enable-chassis-as-gw‘.

We configured our system to use DHCP provided by OVN; consequently the neutron DHCP agent was no longer necessary, we removed this process from our controller node. Similarly, L3 routing was done within OVN meaning that the neutron L3 agent was no longer necessary. Openstack metadata support is implemented differently when OVN is used: instead of having a single metadata process running on a controller serving all metadata requests, the metadata service is deployed on each node and the OVS switch on each node routes requests to 169.254.169.254 to the local metadata agent; this then queries the nova metadata service to obtain the metadata for the specific VM.

The services deployed on the controller and compute nodes are shown in Figure 1 below.

Figure 1: Neutron containers with and without OVN

We used Kolla to deploy the system. Kolla does not currently have full support for OVN; however specific Kolla containers for OVN have been created (e.g. kolla/ubuntu-binary-ovn-controller:queens, kolla/ubuntu-binary-neutron-server-ovn:queens). Hence, we used an approach which augments the standard Kolla-ansible deployment with manual configuration of the extra containers necessary to get the system running on OVN.

As always, many smaller issues were encountered while getting the system working – we will not detail all these issues here, but rather focus on the more substantive issues. We divide these into three specific categories: OVN parameters which need to be configured, configuration specifics for the Kolla OVN containers and finally a point which arose due to assumptions made within Kolla that do not necessarily hold for OVN.

To enable OVN, it was necessary to modify the configuration of the OVS switches operating on all the nodes; the existing OVS containers and OVSDB could be used for this – the OVS version shipped with Kolla/Queens is v2.9.0 – but it was necessary to modify some settings. First, it was necessary to configure system-ids for all of the OVS chassis’ – we chose to select fixed UUIDs a priori and use these for each deployment such that we had a more systematic process for setting up the system but it’s possible to use a randomly generated UUID.

docker exec -ti openvswitch_vswitchd ovs-vsctl set open_vswitch . external-ids:system-id="$SYSTEM_ID"

On the controller node, it was also necessary to set the following parameters:

docker exec -ti openvswitch_vswitchd ovs-vsctl set Open_vSwitch . \
    external_ids:ovn-remote="tcp:$HOST_IP:6642" \
    external_ids:ovn-nb="tcp:$HOST_IP:6641" \
    external_ids:ovn-encap-ip=$HOST_IP external_ids:ovn-encap type="geneve" \
    external-ids:ovn-cms-options="enable-chassis-as-gw"

docker exec openvswitch_vswitchd ovs-vsctl set open . external-ids:ovn-bridge-mappings=physnet1:br-ex

On the compute nodes this was necessary:

docker exec -ti openvswitch_vswitchd ovs-vsctl set Open_vSwitch . \
    external_ids:ovn-remote="tcp:$OVN_SB_HOST_IP:6642" \
    external_ids:ovn-nb="tcp:$OVN_NB_HOST_IP:6641" \
    external_ids:ovn-encap-ip=$HOST_IP \
    external_ids:ovn-encap-type="geneve"

Having changed the OVS configuration on all the nodes, it was then necessary to get the services operational on the nodes. There are two specific aspects to this: modifying the service configuration files as necessary and starting the new services in the correct way.

Not many changes to the service configurations were required. The primary changes related to ensuring the the OVN mechanism driver was used and letting neutron know how to communicate with OVN. We also used the geneve tunnelling protocol in our deployment and this required the following configuration settings:

  • For the neutron server OVN container
    • ml2_conf.ini
              mechanism_drivers = ovn
       	type_drivers = local,flat,vlan,geneve
       	tenant_network_types = geneve
      
       	[ml2_type_geneve]
       	vni_ranges = 1:65536
       	max_header_size = 38
      
       	[ovn]
       	ovn_nb_connection = tcp:172.30.0.101:6641
       	ovn_sb_connection = tcp:172.30.0.101:6642
       	ovn_l3_scheduler = leastloaded
       	ovn_metadata_enabled = true
      
    • neutron.conf
              core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
       	service_plugins = networking_ovn.l3.l3_ovn.OVNL3RouterPlugin
      
  • For the metadata agent container (running on the compute nodes) it was necessary to configure it to point at the nova metadata service with the appropriate shared key as well as how to communicate with OVS running on each of the compute nodes
            nova_metadata_host = 172.30.0.101
     	metadata_proxy_shared_secret = <SECRET>
     	bridge_mappings = physnet1:br-ex
     	datapath_type = system
     	ovsdb_connection = tcp:127.0.0.1:6640
     	local_ip = 172.30.0.101
    

For the OVN specific containers – ovn-northd, ovn-sb and ovn-nb databases, it was necessary to ensure that they had the correct configuration at startup; specifically, that they knew how to communicate with the relevant dbs. Hence, start commands such as

/usr/sbin/ovsdb-server /var/lib/openvswitch/ovnnb.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/run/openvswitch/ovnnb_db.sock --remote=ptcp:$ovnnb_port:$ovsdb_ip --unixctl=/run/openvswitch/ovnnb_db.ctl --log-file=/var/log/kolla/openvswitch/ovsdb-server-nb.log

were necessary (for the ovn northbound database) and we had to modify the container start process accordingly.

It was also necessary to update the neutron database to support OVN specific versioning information: this was straightforward using the following command:

docker exec -ti neutron-server-ovn_neutron_server_ovn_1 neutron-db-manage upgrade heads

The last issue which we had to overcome was that Kolla and neutron OVN had slightly different views regarding the naming of the external bridges. Kolla-ansible configured a connection between the br-ex and br-int OVS bridges on the controller node with port names phy-br-ex and int-br-ex respectively. OVN also created ports with the same purpose but with different names patch-provnet-<UUID>-to-br-int and patch-br-int-to-provonet-<UUID>; as these ports had the same purpose, our somewhat hacky solution was to manually remove the the ports created in the first instance by Kolla-ansible.

Having overcome all these steps, it was possible to launch a VM which had external network connectivity and to which a floating IP address could be assigned.

Clearly, this approach is not realistic for supporting a production environment, but it’s an appropriate level of hackery for a testbed.

Other noteworthy issues which arose during this work include the following:

  • Standard docker apparmor configuration in ubuntu is such that mount cannot be run inside containers, even if they have the appropriate privileges. This has to be disabled or else it is necessary to ensure that the containers do not use the default docker apparmor profile.
  • A specific issue with mounts inside a container which resulted in the mount table filling up with 65536 mounts and rendering the host quite unusable (thanks to Stefan for providing a bit more detail on this) – the workaround was to ensure that /run/netns was bind mounted into the container.
  • As we used geneve encapsulation, geneve kernel modules had to be loaded
  • Full datapath NAT support is only available for linux kernel 4.6 and up. We had to upgrade the 4.4 kernel which came with our standard ubuntu 16.04 environment.

This is certainly not a complete guide to how to get Openstack up and running with OVN, but may be useful to some folks who are toying with this. In future, we’re going to experiment with extending OVN to an edge networking context and will provide more details as this work evolves.

 

The 1st International Workshop on Heterogeneous Distributed Cloud Computing

As we look to the future of cloud computing, there are good reasons to think that the cloud of the future will differ significantly from that which we know today. Although nobody knows exactly how it will evolve, it is likely we will see significant changes in two important dimensions – heterogeneity and decentralization. Let’s consider each of these in turn.

The earlier cloud systems were characterized by homogeneity to the point that they were considered analogous to commodities: however, as these systems have evolved, they had to increasingly cater for the general complexity of IT systems and hence more and more options became available. For example, AWS currently provides 56 different instance types. Storage has also become differentiated with different types of physical storage – primarily spinning disks and SSD storage at present, but this will be augmented in future with newer storage technologies such as Intel Optane which can be considered as something between memory and classical secondary storage – but also in terms of types of storage with object storage clearly in the ascendency, block storage being around for some time and also a need for longer term archival solutions. Further, there is increasing heterogeneity relating to the basic compute units that are being used in Data Centres: GPUs are catering for many large and complex workloads, ARM processors are being increasingly seen as credible within the Data Centre, customized ASICs such as the TPU are on offer and there is important innovation coming from the open source hardware movement – specifically the open source ISA of RISC-V.

As well as increased heterogeneity, there are good reasons to believe that the highly centralized systems that characterized the first wave of cloud computing will give way to much more decentralized systems in which the large data centres will be augmented by smaller scale resources. Hybrid cloud is one aspect of this trend which is well established and poised for rapid growth. One particularly interesting example which fits clearly in the hybrid cloud arena is Microsoft’s Azure Stack which is intended to enable Azure to operate within the enterprise DC as well as inside Microsoft’s large DCs: while this can have benefits for the enterprise, from the cloud operator’s perspective, it’s a way of realizing a much more decentralized cloud. The telecoms sector is also investigating more decentralized approaches with initiatives such as Central Office Rearchitected as Data Centres.

The combination of these two fundamental trends in the evolution of cloud computing will give rise to many new and interesting problems which are interesting both from an industry perspective as well as an academic perspective. For this reason, we decided to organize a workshop co-located with the Utility and Cloud Computing Conference 2017 which focuses on these issues: the 1st International Workshop on Heterogeneous Distributed Cloud Computing which will take place in December 2017.

We’re looking forward to an exciting, interactive workshop with interesting contributions covering diverse topics: if these are topics that interest you, we invite you to make a submission to the workshop before the deadline of July 30. Just click here to submit.

 

An overview of networking in Rancher using Cattle

As noted elsewhere, we’re looking at Rancher in the context of one of our projects. We’ve been doing some work on enabling it to work over heterogeneous compute infrastructures – one of which could be an ARM based edge device and one a standard x86_64 cloud execution environment. Some of our colleagues were asking how the networking works – we had not looked into this in much detail, so we decided to find – turns out it’s pretty complex.

Continue reading

Rancher – initial experience report

In the context of the FINEXT project, we have been reviewing Rancher as a tool to support easy deployment of FIWARE components. (Our colleagues in the project have more experience with this tool – we’re still climbing the learning curve). Here are a few observations relating to Rancher.

The primary problem that Rancher solves is management of potentially disparate (sets of) IaaS resources to provide support for deploying containerized applications. Another important aspect of the Rancher vision is the application catalog – a well defined set of containerized applications that can be deployed to a container platform.

This is, of course, a very noisy area with much technology competition: the Rancher team developed their own orchestration framework – Cattle – but it was clear from some time ago that there would be many different orchestration frameworks and they intelligently decided to integrate with other platforms which were gaining traction. Specifically, they provide support for Kubernetes, Swarm and Mesos.

While playing with Rancher to understand how it works, we looked at how it supports three use cases:

  • Deployment of applications on IaaS with Cattle based container management
  • Deployment of applications on IaaS with Kubernetes container management
  • Deployment of applications on IaaS with Swarm container management

Although the Mesos case is also interesting (and we like Mesos!), we decided not to consider it as Mesos does not currently have as much momentum as the other technologies.

The basic Rancher approach

Before discussing our initial observations, it is appropriate to give some details on key concepts in Rancher.

Rancher supports so-called Environments which are defined by a specific orchestration framework (eg Cattle, Kubernetes, Swarm) and comprise of a set of Hosts. Applications can be deployed to Environments from the Application Catalog; obviously, Applications need to be defined in a manner that is compatible with the Environment’s orchestration mechanisms – for Cattle and Swarm, applications can be defined using docker-compose format; Kubernetes environments require a different pod-based format.

Hosts are typically VMs that run in cloud platforms – although it is possible to configure these manually, the intended use case is that these are created by docker-machine. docker-machine contains drivers for many different hosting providers and Rancher leverages these to enable Hosts to be provisioned on a wide range of different providers. Rancher provisioning is quite complex, but generally it comprises of deploying rancher-agent which enables Rancher to monitor and control the host and deployment of an overlay network which enables the Host to network with the other Hosts in the Environment.

The workflow then is one in which rancher-server is first deployed (typically in a VM). In rancher-server an Environment is created, Hosts are added to the environment and then Applications can be deployed from the catalog. Note that rancher-server can – typically should – manage multiple different Environments. Rancher provides good support for monitoring the state of the system: for example, it is straightforward to see all containers running on the Hosts, their logs and if they are in an error state.

Standard Cattle Management

Standard Cattle Management is the most developed Rancher capability. In this mode, Cattle – running in rancher-server – is responsible for orchestrating the application on the host. rancher-agent runs on each of the Hosts in privileged mode and hence it has the power to create and destroy containers on each Host. rancher-server communicates with rancher-agent over a websockets connection to obtain the state of the host.

The Application Catalog for this mode is well developed. Rancher comes with a default Application Catalog and supports import of applications to the catalog. Further, it supports the use of private docker registries as it is clear that many applications would not be public. In the Cattle Environment, the Application Catalog is evident (a menu item at the top of the screen). Applications comprise of three files:

  • docker-compose.yml: contains the standard information for launching and managing a multi-container service
  • rancher-compose.yml: which contains the descriptor for the application in the service catalog, what parameters it requires as well as information pertaining to deploying the application over multiple VMs, scaling, health checks etc
  • Answers.txt: contains the default values of the parameters required to launch the application

We did not experiment much with Cattle orchestration, but documentation indicates that it is a sensible orchestration framework which deploys applications in a balanced manner across the Hosts in the Environment.

Rancher with Kubernetes

Rancher also supports Environments based on Kubernetes. As such, Rancher supports rapid and easy deployment of a Kubernetes cluster across disparate hosts: the Environment is created in Rancher and Hosts are added via docker-machine.

As with the Cattle deployment, rancher-agent is deployed on all nodes in the cluster: this enables Rancher to have full visibility of each of the nodes in the Environment – what containers are running etc. It is also used in the process of deploying the Kubernetes environment.

It took us some time to understand the role of the Application Catalog in the Kubernetes context. Although Rancher has some support for a Kubernetes Application Catalog and the Catalog differs from that available for standard Cattle Environments – applications are described in terms of pods – we found that deploying these applications did not work.

The Kubernetes cluster was deployed successfully and usable. Rancher offers a web-based CLI by which applications can be deployed to the cluster (both with kubectl and helm); applications can also be deployed outside of the Kubernetes interface of course with Rancher making the Kubernetes credentials available which can be used by kubectl.

Rancher with Swarm

Rancher support for Swarm is similar to that of Kubernetes in the sense that the primary focus is on managing the Hosts in the Swarm Environment. Rancher provides support for bringing up a Swarm and enabling it to be controlled via the standard Swarm toolset.

It is worth noting that we did have some confusion working with Applications in Swarm. The Swarm deployment mode has the capability to deploy applications from a catalog, although it is not so prominent in the interface. It took us some time to realize that this was not the intended deployment mode – this mechanism uses Cattle for the application deployment rather than Swarm. This was non-obvious – the applications were docker-compose applications and, as such, we assumed that they could be deployed via Swarm. Deploying the applications appeared to work in the sense that they were visible in Rancher, but on closer inspection we found irregularities. Specifically, the application was not deployed in ‘managed’ mode, even though this was stipulated in the Application Catalog; also, docker service ls did not show the application.

The limitations noted above most probably arise because Swarm support is still experimental and will be resolved as the solution matures.

Another noteworthy point relating to Swarm usage is that Rancher provides a very useful interface to both the containers and nodes within the Swarm: this can be used to understand current state and perform troubleshooting. Unlike the Kubernetes environment, the Swarm environment has no such standard tool and Rancher provides significant value add here.

Final Comments

Rancher is a useful and interesting evolving platform. It focuses primarily on the important problem of bridging between the classical VM/IaaS world to the newer container ecosystems; another important aspect of the Rancher vision is application management. As the world of container ecosystems is evolving rapidly – with some technologies offering key parts of Rancher’s vision – it will be challenging for Rancher to span all aspects of application management from VM management to container management to application deployment and management, but the technology has obtained some momentum and solves a real problem and hence it’s likely to be around for a while.

(Thanks to Bruno and Martin for reviewing this!)

Openstack Summit Barcelona 2016 – Day 2

As with the first day of the summit (see recap), the second day started with a keynote. In this case, the focus was on multicloud solutions and how Openstack can perform in this context. A few interesting points stood out from the keynote for us. First up – to emphasize how Openstack is moving – there was an announcement that China Telecom is to deploy an enormous 2000000 (2 million) square meters of Openstack in data centres (see pic). There were a couple of interesting demos, the first of which focused on the system that is used for CI/CD of Openstack itself – this system has quite high requirements and is distributed over a set of heterogeneous resources provided by disparate entities who wish to support Openstack. They demonstrated how easy it is to add a new set of Openstack resources to their platform and how quickly new test workload appears on the new resources. The second interesting demo was of the Openstack Omni project which used the horizon dashboard and openstack APIs to control AWS – it was somehow pitched as one API to rule them all, which is perhaps a bit optimistic, but it reflects the fact that the Openstack API is maturing and more and more applications are being developed against it; EC2 is no longer the only important API in town! Finally, there was a presentation by Crowdstar which highlighted the benefits of Ironic for certain workloads – 60% cost reduction and 40ms reduction in latency – and particularly how it can be used very effectively in conjunction with containers.

Jpeg

Jpeg

There was quite some interest in big data and HPC type of applications – the talks on GPU virtualization and Tensorflow were very well attended, but there is still a lot of work to be done in both these realms. The GPU virtualization work was described in the context of the Nomad project which is attempting to manage heterogeneous compute resources in Openstack; however, the vision they offer is still only at the initial stages. The Tensorflow work compared Magnum and Sahara for deploying a Tensorflow workload – Magnum was selected as the better option, somewhat due to its greater support, but there are still issues with using this as a framework for this type of work.

On a related note there was an interesting talk on unikernels and how they relate to Openstack. The guys from the MIKELANGELO project have developed solutions which enable applications to be packaged into unikernels and executed from image stores. Such solutions can be much more efficient than VM or even container based solutions – they gave an example of a VM image consuming 2GB while the equivalent unikernel consumed 56MB. However, their solution was not really integrated with Openstack and there is still a lot of work to do to make this happen.

At another session, we learnt of the developing ARM Openstack ecosystem: there are ARM Openstack distributions already available and key issues relating to ARM Openstack compute functions have been solved (mostly relating to UEFI and ACPI): the Linaro team is working on expanding the ARM Guest OS support for different Linux distributions. This is a very interesting area which will surely grow as some organizations want to reduce their dependence on Intel and perhaps have some gains in energy efficiency.

We did spend some of the day going around talking to people, so it was not all spent sitting in the sessions – we had great fun with the cloudbase guys who showed us their very cool holoens demo.

And now off to the Rackspace party!

 

Openstack Summit Barcelona 2016 – Day 1

We were lucky to have the opportunity to attend the Openstack Summit in Barcelona this year. The event has become a large event with a few thousand attendees and the scope is getting broader as Openstack evolves and matures.

openstack-summit-pic

The schedule is very dense and a little bit of homework is necessary to maximize the value from the event – the sessions we chose to attend probably capture quite a wide subset of the different conversations that went on.

Continue reading

Reflections on ORConf 2016

We had the chance again this year to attend the really excellent ORConf 2016 (see here for a write up on ORConf 2015). The focus of the conference is on Open Source silicon in general, comprising of aspects of open source hardware design tooling, open source processor designs and open source SoC designs. This area (and community) is very interesting and it has the potential to have a signficant impact on future cloud systems – here are some reflections on the event.

While the conference addressed many different aspects of the digital design space, there was a significant emphasis on the embedded space and/or IoT type use cases: this can probably be attributed to the fact that these systems are somehow easier to design and produce in small quantities as well as the fact that there is a large opportunity in this area. It was noteworthy how quickly the community is evolving with designs presented at ORConf from last year very likely to manifest in working silicon at next year’s conference. It was also noteworthy how the community is squeezing more and more compute performance out of each Watt in their designs. Continue reading

Experience with Neutron High Availability (HA) in Openstack

For the Zurich FIWARE node, we’re setting up a Kilo High Availability (HA) deployment – we’re transitioning from our current Icehouse (non-HA) deployment.

Kilo HA is recommended as there is a general understanding within the project that the HA capabilities are now ready for production use. However, there is no single Kilo HA – there are many different configurations which can be called HA – and in this post, we describe some of the points we encountered while setting up our HA node.

We deployed Mirantis Openstack v7.0 using the Fuel deployment tool, as is used in the project and as we have used before; requiring a HA deployment, we selected the HA configuration in Fuel and we have 3 controller nodes to provide HA. We did have some issues that the deployment did not terminate cleanly, failing in some astute-based post-deployment tests – however, these issues were minor and the system behaved in a sane manner.

Continue reading

The first ICCLab hackathon – a fun and productive few days

Here in ICCLab, we’ve always been interested in the hackathon approach as a way to develop small but practical ideas and make demonstrable prototypes rapidly. With all of our other commitments, it has been difficult for us to set aside the time for such an event, but we finally managed to do this last week.

The scope was loose, with the objective being to develop something which can be demonstrated; we had a small preference for work which was related to our core business – cloud technologies – but we were quite flexible on this point. Continue reading

Introducing the official ICCLab linux distro optimized for simple hypervisor contexts

We needed to deploy a simple hypervisor on one of our internal systems. This system was disconnected from the Internet for some particular reasons and it was surprisingly difficult to find a suitable bundled linux distribution which provided the features we needed. Ubuntu desktop does not come with KVM, ubuntu server does not have X, so it’s not easy to run virt-manager on the machine. There were no CentOS versions that provided the right mix of capabilities; oVirt-node was one reasonable candidate, but it only exposed a management interface which needed to be run remotely or else somehow packaged with the system. One of our colleagues is a big fan of proxmox, but this is overkill for the simple single hypervisor case we wanted.

Continue reading