An overview of networking in Rancher using Cattle

As noted elsewhere, we’re looking at Rancher in the context of one of our projects. We’ve been doing some work on enabling it to work over heterogeneous compute infrastructures – one of which could be an ARM based edge device and one a standard x86_64 cloud execution environment. Some of our colleagues were asking how the networking works – we had not looked into this in much detail, so we decided to find – turns out it’s pretty complex.

Continue reading

Rancher – initial experience report

In the context of the FINEXT project, we have been reviewing Rancher as a tool to support easy deployment of FIWARE components. (Our colleagues in the project have more experience with this tool – we’re still climbing the learning curve). Here are a few observations relating to Rancher.

The primary problem that Rancher solves is management of potentially disparate (sets of) IaaS resources to provide support for deploying containerized applications. Another important aspect of the Rancher vision is the application catalog – a well defined set of containerized applications that can be deployed to a container platform.

This is, of course, a very noisy area with much technology competition: the Rancher team developed their own orchestration framework – Cattle – but it was clear from some time ago that there would be many different orchestration frameworks and they intelligently decided to integrate with other platforms which were gaining traction. Specifically, they provide support for Kubernetes, Swarm and Mesos.

While playing with Rancher to understand how it works, we looked at how it supports three use cases:

  • Deployment of applications on IaaS with Cattle based container management
  • Deployment of applications on IaaS with Kubernetes container management
  • Deployment of applications on IaaS with Swarm container management

Although the Mesos case is also interesting (and we like Mesos!), we decided not to consider it as Mesos does not currently have as much momentum as the other technologies.

The basic Rancher approach

Before discussing our initial observations, it is appropriate to give some details on key concepts in Rancher.

Rancher supports so-called Environments which are defined by a specific orchestration framework (eg Cattle, Kubernetes, Swarm) and comprise of a set of Hosts. Applications can be deployed to Environments from the Application Catalog; obviously, Applications need to be defined in a manner that is compatible with the Environment’s orchestration mechanisms – for Cattle and Swarm, applications can be defined using docker-compose format; Kubernetes environments require a different pod-based format.

Hosts are typically VMs that run in cloud platforms – although it is possible to configure these manually, the intended use case is that these are created by docker-machine. docker-machine contains drivers for many different hosting providers and Rancher leverages these to enable Hosts to be provisioned on a wide range of different providers. Rancher provisioning is quite complex, but generally it comprises of deploying rancher-agent which enables Rancher to monitor and control the host and deployment of an overlay network which enables the Host to network with the other Hosts in the Environment.

The workflow then is one in which rancher-server is first deployed (typically in a VM). In rancher-server an Environment is created, Hosts are added to the environment and then Applications can be deployed from the catalog. Note that rancher-server can – typically should – manage multiple different Environments. Rancher provides good support for monitoring the state of the system: for example, it is straightforward to see all containers running on the Hosts, their logs and if they are in an error state.

Standard Cattle Management

Standard Cattle Management is the most developed Rancher capability. In this mode, Cattle – running in rancher-server – is responsible for orchestrating the application on the host. rancher-agent runs on each of the Hosts in privileged mode and hence it has the power to create and destroy containers on each Host. rancher-server communicates with rancher-agent over a websockets connection to obtain the state of the host.

The Application Catalog for this mode is well developed. Rancher comes with a default Application Catalog and supports import of applications to the catalog. Further, it supports the use of private docker registries as it is clear that many applications would not be public. In the Cattle Environment, the Application Catalog is evident (a menu item at the top of the screen). Applications comprise of three files:

  • docker-compose.yml: contains the standard information for launching and managing a multi-container service
  • rancher-compose.yml: which contains the descriptor for the application in the service catalog, what parameters it requires as well as information pertaining to deploying the application over multiple VMs, scaling, health checks etc
  • Answers.txt: contains the default values of the parameters required to launch the application

We did not experiment much with Cattle orchestration, but documentation indicates that it is a sensible orchestration framework which deploys applications in a balanced manner across the Hosts in the Environment.

Rancher with Kubernetes

Rancher also supports Environments based on Kubernetes. As such, Rancher supports rapid and easy deployment of a Kubernetes cluster across disparate hosts: the Environment is created in Rancher and Hosts are added via docker-machine.

As with the Cattle deployment, rancher-agent is deployed on all nodes in the cluster: this enables Rancher to have full visibility of each of the nodes in the Environment – what containers are running etc. It is also used in the process of deploying the Kubernetes environment.

It took us some time to understand the role of the Application Catalog in the Kubernetes context. Although Rancher has some support for a Kubernetes Application Catalog and the Catalog differs from that available for standard Cattle Environments – applications are described in terms of pods – we found that deploying these applications did not work.

The Kubernetes cluster was deployed successfully and usable. Rancher offers a web-based CLI by which applications can be deployed to the cluster (both with kubectl and helm); applications can also be deployed outside of the Kubernetes interface of course with Rancher making the Kubernetes credentials available which can be used by kubectl.

Rancher with Swarm

Rancher support for Swarm is similar to that of Kubernetes in the sense that the primary focus is on managing the Hosts in the Swarm Environment. Rancher provides support for bringing up a Swarm and enabling it to be controlled via the standard Swarm toolset.

It is worth noting that we did have some confusion working with Applications in Swarm. The Swarm deployment mode has the capability to deploy applications from a catalog, although it is not so prominent in the interface. It took us some time to realize that this was not the intended deployment mode – this mechanism uses Cattle for the application deployment rather than Swarm. This was non-obvious – the applications were docker-compose applications and, as such, we assumed that they could be deployed via Swarm. Deploying the applications appeared to work in the sense that they were visible in Rancher, but on closer inspection we found irregularities. Specifically, the application was not deployed in ‘managed’ mode, even though this was stipulated in the Application Catalog; also, docker service ls did not show the application.

The limitations noted above most probably arise because Swarm support is still experimental and will be resolved as the solution matures.

Another noteworthy point relating to Swarm usage is that Rancher provides a very useful interface to both the containers and nodes within the Swarm: this can be used to understand current state and perform troubleshooting. Unlike the Kubernetes environment, the Swarm environment has no such standard tool and Rancher provides significant value add here.

Final Comments

Rancher is a useful and interesting evolving platform. It focuses primarily on the important problem of bridging between the classical VM/IaaS world to the newer container ecosystems; another important aspect of the Rancher vision is application management. As the world of container ecosystems is evolving rapidly – with some technologies offering key parts of Rancher’s vision – it will be challenging for Rancher to span all aspects of application management from VM management to container management to application deployment and management, but the technology has obtained some momentum and solves a real problem and hence it’s likely to be around for a while.

(Thanks to Bruno and Martin for reviewing this!)

Openstack Summit Barcelona 2016 – Day 2

As with the first day of the summit (see recap), the second day started with a keynote. In this case, the focus was on multicloud solutions and how Openstack can perform in this context. A few interesting points stood out from the keynote for us. First up – to emphasize how Openstack is moving – there was an announcement that China Telecom is to deploy an enormous 2000000 (2 million) square meters of Openstack in data centres (see pic). There were a couple of interesting demos, the first of which focused on the system that is used for CI/CD of Openstack itself – this system has quite high requirements and is distributed over a set of heterogeneous resources provided by disparate entities who wish to support Openstack. They demonstrated how easy it is to add a new set of Openstack resources to their platform and how quickly new test workload appears on the new resources. The second interesting demo was of the Openstack Omni project which used the horizon dashboard and openstack APIs to control AWS – it was somehow pitched as one API to rule them all, which is perhaps a bit optimistic, but it reflects the fact that the Openstack API is maturing and more and more applications are being developed against it; EC2 is no longer the only important API in town! Finally, there was a presentation by Crowdstar which highlighted the benefits of Ironic for certain workloads – 60% cost reduction and 40ms reduction in latency – and particularly how it can be used very effectively in conjunction with containers.

Jpeg

Jpeg

There was quite some interest in big data and HPC type of applications – the talks on GPU virtualization and Tensorflow were very well attended, but there is still a lot of work to be done in both these realms. The GPU virtualization work was described in the context of the Nomad project which is attempting to manage heterogeneous compute resources in Openstack; however, the vision they offer is still only at the initial stages. The Tensorflow work compared Magnum and Sahara for deploying a Tensorflow workload – Magnum was selected as the better option, somewhat due to its greater support, but there are still issues with using this as a framework for this type of work.

On a related note there was an interesting talk on unikernels and how they relate to Openstack. The guys from the MIKELANGELO project have developed solutions which enable applications to be packaged into unikernels and executed from image stores. Such solutions can be much more efficient than VM or even container based solutions – they gave an example of a VM image consuming 2GB while the equivalent unikernel consumed 56MB. However, their solution was not really integrated with Openstack and there is still a lot of work to do to make this happen.

At another session, we learnt of the developing ARM Openstack ecosystem: there are ARM Openstack distributions already available and key issues relating to ARM Openstack compute functions have been solved (mostly relating to UEFI and ACPI): the Linaro team is working on expanding the ARM Guest OS support for different Linux distributions. This is a very interesting area which will surely grow as some organizations want to reduce their dependence on Intel and perhaps have some gains in energy efficiency.

We did spend some of the day going around talking to people, so it was not all spent sitting in the sessions – we had great fun with the cloudbase guys who showed us their very cool holoens demo.

And now off to the Rackspace party!

 

Openstack Summit Barcelona 2016 – Day 1

We were lucky to have the opportunity to attend the Openstack Summit in Barcelona this year. The event has become a large event with a few thousand attendees and the scope is getting broader as Openstack evolves and matures.

openstack-summit-pic

The schedule is very dense and a little bit of homework is necessary to maximize the value from the event – the sessions we chose to attend probably capture quite a wide subset of the different conversations that went on.

Continue reading

Reflections on ORConf 2016

We had the chance again this year to attend the really excellent ORConf 2016 (see here for a write up on ORConf 2015). The focus of the conference is on Open Source silicon in general, comprising of aspects of open source hardware design tooling, open source processor designs and open source SoC designs. This area (and community) is very interesting and it has the potential to have a signficant impact on future cloud systems – here are some reflections on the event.

While the conference addressed many different aspects of the digital design space, there was a significant emphasis on the embedded space and/or IoT type use cases: this can probably be attributed to the fact that these systems are somehow easier to design and produce in small quantities as well as the fact that there is a large opportunity in this area. It was noteworthy how quickly the community is evolving with designs presented at ORConf from last year very likely to manifest in working silicon at next year’s conference. It was also noteworthy how the community is squeezing more and more compute performance out of each Watt in their designs. Continue reading

Experience with Neutron High Availability (HA) in Openstack

For the Zurich FIWARE node, we’re setting up a Kilo High Availability (HA) deployment – we’re transitioning from our current Icehouse (non-HA) deployment.

Kilo HA is recommended as there is a general understanding within the project that the HA capabilities are now ready for production use. However, there is no single Kilo HA – there are many different configurations which can be called HA – and in this post, we describe some of the points we encountered while setting up our HA node.

We deployed Mirantis Openstack v7.0 using the Fuel deployment tool, as is used in the project and as we have used before; requiring a HA deployment, we selected the HA configuration in Fuel and we have 3 controller nodes to provide HA. We did have some issues that the deployment did not terminate cleanly, failing in some astute-based post-deployment tests – however, these issues were minor and the system behaved in a sane manner.

Continue reading

The first ICCLab hackathon – a fun and productive few days

Here in ICCLab, we’ve always been interested in the hackathon approach as a way to develop small but practical ideas and make demonstrable prototypes rapidly. With all of our other commitments, it has been difficult for us to set aside the time for such an event, but we finally managed to do this last week.

The scope was loose, with the objective being to develop something which can be demonstrated; we had a small preference for work which was related to our core business – cloud technologies – but we were quite flexible on this point. Continue reading

Introducing the official ICCLab linux distro optimized for simple hypervisor contexts

We needed to deploy a simple hypervisor on one of our internal systems. This system was disconnected from the Internet for some particular reasons and it was surprisingly difficult to find a suitable bundled linux distribution which provided the features we needed. Ubuntu desktop does not come with KVM, ubuntu server does not have X, so it’s not easy to run virt-manager on the machine. There were no CentOS versions that provided the right mix of capabilities; oVirt-node was one reasonable candidate, but it only exposed a management interface which needed to be run remotely or else somehow packaged with the system. One of our colleagues is a big fan of proxmox, but this is overkill for the simple single hypervisor case we wanted.

Continue reading

Networking and Security in an Openstack Compute Node: a complex combination of iptables and (linux and OVS) bridging…

We had to investigate the operation of one of our Openstack compute nodes as it was exhibiting some unusual behaviour. We quickly determined that there was some unexpected packet loss and we had reason to believe that this could have been due to the packet processing in the node. Investigating this problem necessitated some deeper exploration of how packets are processed in the node, particularly relating to the mix of ovs bridges, linux bridges and iptables. It turns out that this is rather complex and clear information describing how all this fits together in detail is not readily available. Here, we note what we learnt from this exploration.

Continue reading

Making Fog computing real – Research challenges in integrating localized computing nodes into the cloud

Carlo Vallati was a visiting researcher during Aug/Sept 2015. (See here for a short note on his experience visiting). In this post he outlines how cloud computing needs to evolve to meet future requirements.

Despite the increasing usage of cloud computing as enabler for a wide number of applications, the next wave of technological evolution – the Internet of Things and Robotics – will require the extension of the classical centralized cloud computing architecture towards a more distributed architecture that includes computing and storage nodes installed close to users and physical systems. Edge computing will also require greater flexibility, necessary to handle the huge increase in the number of devices – a distributed architecture will guarantee scalability – and to deal with privacy concerns that are arising among end users – edge computing will limit exposure of private data[1]. Continue reading