The 1st International Workshop on Heterogeneous Distributed Cloud Computing

As we look to the future of cloud computing, there are good reasons to think that the cloud of the future will differ significantly from that which we know today. Although nobody knows exactly how it will evolve, it is likely we will see significant changes in two important dimensions – heterogeneity and decentralization. Let’s consider each of these in turn.

The earlier cloud systems were characterized by homogeneity to the point that they were considered analogous to commodities: however, as these systems have evolved, they had to increasingly cater for the general complexity of IT systems and hence more and more options became available. For example, AWS currently provides 56 different instance types. Storage has also become differentiated with different types of physical storage – primarily spinning disks and SSD storage at present, but this will be augmented in future with newer storage technologies such as Intel Optane which can be considered as something between memory and classical secondary storage – but also in terms of types of storage with object storage clearly in the ascendency, block storage being around for some time and also a need for longer term archival solutions. Further, there is increasing heterogeneity relating to the basic compute units that are being used in Data Centres: GPUs are catering for many large and complex workloads, ARM processors are being increasingly seen as credible within the Data Centre, customized ASICs such as the TPU are on offer and there is important innovation coming from the open source hardware movement – specifically the open source ISA of RISC-V.

As well as increased heterogeneity, there are good reasons to believe that the highly centralized systems that characterized the first wave of cloud computing will give way to much more decentralized systems in which the large data centres will be augmented by smaller scale resources. Hybrid cloud is one aspect of this trend which is well established and poised for rapid growth. One particularly interesting example which fits clearly in the hybrid cloud arena is Microsoft’s Azure Stack which is intended to enable Azure to operate within the enterprise DC as well as inside Microsoft’s large DCs: while this can have benefits for the enterprise, from the cloud operator’s perspective, it’s a way of realizing a much more decentralized cloud. The telecoms sector is also investigating more decentralized approaches with initiatives such as Central Office Rearchitected as Data Centres.

The combination of these two fundamental trends in the evolution of cloud computing will give rise to many new and interesting problems which are interesting both from an industry perspective as well as an academic perspective. For this reason, we decided to organize a workshop co-located with the Utility and Cloud Computing Conference 2017 which focuses on these issues: the 1st International Workshop on Heterogeneous Distributed Cloud Computing which will take place in December 2017.

We’re looking forward to an exciting, interactive workshop with interesting contributions covering diverse topics: if these are topics that interest you, we invite you to make a submission to the workshop before the deadline of July 30. Just click here to submit.

 

An overview of networking in Rancher using Cattle

As noted elsewhere, we’re looking at Rancher in the context of one of our projects. We’ve been doing some work on enabling it to work over heterogeneous compute infrastructures – one of which could be an ARM based edge device and one a standard x86_64 cloud execution environment. Some of our colleagues were asking how the networking works – we had not looked into this in much detail, so we decided to find – turns out it’s pretty complex.

Continue reading

Rancher – initial experience report

In the context of the FINEXT project, we have been reviewing Rancher as a tool to support easy deployment of FIWARE components. (Our colleagues in the project have more experience with this tool – we’re still climbing the learning curve). Here are a few observations relating to Rancher.

The primary problem that Rancher solves is management of potentially disparate (sets of) IaaS resources to provide support for deploying containerized applications. Another important aspect of the Rancher vision is the application catalog – a well defined set of containerized applications that can be deployed to a container platform.

This is, of course, a very noisy area with much technology competition: the Rancher team developed their own orchestration framework – Cattle – but it was clear from some time ago that there would be many different orchestration frameworks and they intelligently decided to integrate with other platforms which were gaining traction. Specifically, they provide support for Kubernetes, Swarm and Mesos.

While playing with Rancher to understand how it works, we looked at how it supports three use cases:

  • Deployment of applications on IaaS with Cattle based container management
  • Deployment of applications on IaaS with Kubernetes container management
  • Deployment of applications on IaaS with Swarm container management

Although the Mesos case is also interesting (and we like Mesos!), we decided not to consider it as Mesos does not currently have as much momentum as the other technologies.

The basic Rancher approach

Before discussing our initial observations, it is appropriate to give some details on key concepts in Rancher.

Rancher supports so-called Environments which are defined by a specific orchestration framework (eg Cattle, Kubernetes, Swarm) and comprise of a set of Hosts. Applications can be deployed to Environments from the Application Catalog; obviously, Applications need to be defined in a manner that is compatible with the Environment’s orchestration mechanisms – for Cattle and Swarm, applications can be defined using docker-compose format; Kubernetes environments require a different pod-based format.

Hosts are typically VMs that run in cloud platforms – although it is possible to configure these manually, the intended use case is that these are created by docker-machine. docker-machine contains drivers for many different hosting providers and Rancher leverages these to enable Hosts to be provisioned on a wide range of different providers. Rancher provisioning is quite complex, but generally it comprises of deploying rancher-agent which enables Rancher to monitor and control the host and deployment of an overlay network which enables the Host to network with the other Hosts in the Environment.

The workflow then is one in which rancher-server is first deployed (typically in a VM). In rancher-server an Environment is created, Hosts are added to the environment and then Applications can be deployed from the catalog. Note that rancher-server can – typically should – manage multiple different Environments. Rancher provides good support for monitoring the state of the system: for example, it is straightforward to see all containers running on the Hosts, their logs and if they are in an error state.

Standard Cattle Management

Standard Cattle Management is the most developed Rancher capability. In this mode, Cattle – running in rancher-server – is responsible for orchestrating the application on the host. rancher-agent runs on each of the Hosts in privileged mode and hence it has the power to create and destroy containers on each Host. rancher-server communicates with rancher-agent over a websockets connection to obtain the state of the host.

The Application Catalog for this mode is well developed. Rancher comes with a default Application Catalog and supports import of applications to the catalog. Further, it supports the use of private docker registries as it is clear that many applications would not be public. In the Cattle Environment, the Application Catalog is evident (a menu item at the top of the screen). Applications comprise of three files:

  • docker-compose.yml: contains the standard information for launching and managing a multi-container service
  • rancher-compose.yml: which contains the descriptor for the application in the service catalog, what parameters it requires as well as information pertaining to deploying the application over multiple VMs, scaling, health checks etc
  • Answers.txt: contains the default values of the parameters required to launch the application

We did not experiment much with Cattle orchestration, but documentation indicates that it is a sensible orchestration framework which deploys applications in a balanced manner across the Hosts in the Environment.

Rancher with Kubernetes

Rancher also supports Environments based on Kubernetes. As such, Rancher supports rapid and easy deployment of a Kubernetes cluster across disparate hosts: the Environment is created in Rancher and Hosts are added via docker-machine.

As with the Cattle deployment, rancher-agent is deployed on all nodes in the cluster: this enables Rancher to have full visibility of each of the nodes in the Environment – what containers are running etc. It is also used in the process of deploying the Kubernetes environment.

It took us some time to understand the role of the Application Catalog in the Kubernetes context. Although Rancher has some support for a Kubernetes Application Catalog and the Catalog differs from that available for standard Cattle Environments – applications are described in terms of pods – we found that deploying these applications did not work.

The Kubernetes cluster was deployed successfully and usable. Rancher offers a web-based CLI by which applications can be deployed to the cluster (both with kubectl and helm); applications can also be deployed outside of the Kubernetes interface of course with Rancher making the Kubernetes credentials available which can be used by kubectl.

Rancher with Swarm

Rancher support for Swarm is similar to that of Kubernetes in the sense that the primary focus is on managing the Hosts in the Swarm Environment. Rancher provides support for bringing up a Swarm and enabling it to be controlled via the standard Swarm toolset.

It is worth noting that we did have some confusion working with Applications in Swarm. The Swarm deployment mode has the capability to deploy applications from a catalog, although it is not so prominent in the interface. It took us some time to realize that this was not the intended deployment mode – this mechanism uses Cattle for the application deployment rather than Swarm. This was non-obvious – the applications were docker-compose applications and, as such, we assumed that they could be deployed via Swarm. Deploying the applications appeared to work in the sense that they were visible in Rancher, but on closer inspection we found irregularities. Specifically, the application was not deployed in ‘managed’ mode, even though this was stipulated in the Application Catalog; also, docker service ls did not show the application.

The limitations noted above most probably arise because Swarm support is still experimental and will be resolved as the solution matures.

Another noteworthy point relating to Swarm usage is that Rancher provides a very useful interface to both the containers and nodes within the Swarm: this can be used to understand current state and perform troubleshooting. Unlike the Kubernetes environment, the Swarm environment has no such standard tool and Rancher provides significant value add here.

Final Comments

Rancher is a useful and interesting evolving platform. It focuses primarily on the important problem of bridging between the classical VM/IaaS world to the newer container ecosystems; another important aspect of the Rancher vision is application management. As the world of container ecosystems is evolving rapidly – with some technologies offering key parts of Rancher’s vision – it will be challenging for Rancher to span all aspects of application management from VM management to container management to application deployment and management, but the technology has obtained some momentum and solves a real problem and hence it’s likely to be around for a while.

(Thanks to Bruno and Martin for reviewing this!)

Openstack Summit Barcelona 2016 – Day 2

As with the first day of the summit (see recap), the second day started with a keynote. In this case, the focus was on multicloud solutions and how Openstack can perform in this context. A few interesting points stood out from the keynote for us. First up – to emphasize how Openstack is moving – there was an announcement that China Telecom is to deploy an enormous 2000000 (2 million) square meters of Openstack in data centres (see pic). There were a couple of interesting demos, the first of which focused on the system that is used for CI/CD of Openstack itself – this system has quite high requirements and is distributed over a set of heterogeneous resources provided by disparate entities who wish to support Openstack. They demonstrated how easy it is to add a new set of Openstack resources to their platform and how quickly new test workload appears on the new resources. The second interesting demo was of the Openstack Omni project which used the horizon dashboard and openstack APIs to control AWS – it was somehow pitched as one API to rule them all, which is perhaps a bit optimistic, but it reflects the fact that the Openstack API is maturing and more and more applications are being developed against it; EC2 is no longer the only important API in town! Finally, there was a presentation by Crowdstar which highlighted the benefits of Ironic for certain workloads – 60% cost reduction and 40ms reduction in latency – and particularly how it can be used very effectively in conjunction with containers.

Jpeg

Jpeg

There was quite some interest in big data and HPC type of applications – the talks on GPU virtualization and Tensorflow were very well attended, but there is still a lot of work to be done in both these realms. The GPU virtualization work was described in the context of the Nomad project which is attempting to manage heterogeneous compute resources in Openstack; however, the vision they offer is still only at the initial stages. The Tensorflow work compared Magnum and Sahara for deploying a Tensorflow workload – Magnum was selected as the better option, somewhat due to its greater support, but there are still issues with using this as a framework for this type of work.

On a related note there was an interesting talk on unikernels and how they relate to Openstack. The guys from the MIKELANGELO project have developed solutions which enable applications to be packaged into unikernels and executed from image stores. Such solutions can be much more efficient than VM or even container based solutions – they gave an example of a VM image consuming 2GB while the equivalent unikernel consumed 56MB. However, their solution was not really integrated with Openstack and there is still a lot of work to do to make this happen.

At another session, we learnt of the developing ARM Openstack ecosystem: there are ARM Openstack distributions already available and key issues relating to ARM Openstack compute functions have been solved (mostly relating to UEFI and ACPI): the Linaro team is working on expanding the ARM Guest OS support for different Linux distributions. This is a very interesting area which will surely grow as some organizations want to reduce their dependence on Intel and perhaps have some gains in energy efficiency.

We did spend some of the day going around talking to people, so it was not all spent sitting in the sessions – we had great fun with the cloudbase guys who showed us their very cool holoens demo.

And now off to the Rackspace party!

 

Openstack Summit Barcelona 2016 – Day 1

We were lucky to have the opportunity to attend the Openstack Summit in Barcelona this year. The event has become a large event with a few thousand attendees and the scope is getting broader as Openstack evolves and matures.

openstack-summit-pic

The schedule is very dense and a little bit of homework is necessary to maximize the value from the event – the sessions we chose to attend probably capture quite a wide subset of the different conversations that went on.

Continue reading

Reflections on ORConf 2016

We had the chance again this year to attend the really excellent ORConf 2016 (see here for a write up on ORConf 2015). The focus of the conference is on Open Source silicon in general, comprising of aspects of open source hardware design tooling, open source processor designs and open source SoC designs. This area (and community) is very interesting and it has the potential to have a signficant impact on future cloud systems – here are some reflections on the event.

While the conference addressed many different aspects of the digital design space, there was a significant emphasis on the embedded space and/or IoT type use cases: this can probably be attributed to the fact that these systems are somehow easier to design and produce in small quantities as well as the fact that there is a large opportunity in this area. It was noteworthy how quickly the community is evolving with designs presented at ORConf from last year very likely to manifest in working silicon at next year’s conference. It was also noteworthy how the community is squeezing more and more compute performance out of each Watt in their designs. Continue reading

Experience with Neutron High Availability (HA) in Openstack

For the Zurich FIWARE node, we’re setting up a Kilo High Availability (HA) deployment – we’re transitioning from our current Icehouse (non-HA) deployment.

Kilo HA is recommended as there is a general understanding within the project that the HA capabilities are now ready for production use. However, there is no single Kilo HA – there are many different configurations which can be called HA – and in this post, we describe some of the points we encountered while setting up our HA node.

We deployed Mirantis Openstack v7.0 using the Fuel deployment tool, as is used in the project and as we have used before; requiring a HA deployment, we selected the HA configuration in Fuel and we have 3 controller nodes to provide HA. We did have some issues that the deployment did not terminate cleanly, failing in some astute-based post-deployment tests – however, these issues were minor and the system behaved in a sane manner.

Continue reading

The first ICCLab hackathon – a fun and productive few days

Here in ICCLab, we’ve always been interested in the hackathon approach as a way to develop small but practical ideas and make demonstrable prototypes rapidly. With all of our other commitments, it has been difficult for us to set aside the time for such an event, but we finally managed to do this last week.

The scope was loose, with the objective being to develop something which can be demonstrated; we had a small preference for work which was related to our core business – cloud technologies – but we were quite flexible on this point. Continue reading

Introducing the official ICCLab linux distro optimized for simple hypervisor contexts

We needed to deploy a simple hypervisor on one of our internal systems. This system was disconnected from the Internet for some particular reasons and it was surprisingly difficult to find a suitable bundled linux distribution which provided the features we needed. Ubuntu desktop does not come with KVM, ubuntu server does not have X, so it’s not easy to run virt-manager on the machine. There were no CentOS versions that provided the right mix of capabilities; oVirt-node was one reasonable candidate, but it only exposed a management interface which needed to be run remotely or else somehow packaged with the system. One of our colleagues is a big fan of proxmox, but this is overkill for the simple single hypervisor case we wanted.

Continue reading

Networking and Security in an Openstack Compute Node: a complex combination of iptables and (linux and OVS) bridging…

We had to investigate the operation of one of our Openstack compute nodes as it was exhibiting some unusual behaviour. We quickly determined that there was some unexpected packet loss and we had reason to believe that this could have been due to the packet processing in the node. Investigating this problem necessitated some deeper exploration of how packets are processed in the node, particularly relating to the mix of ovs bridges, linux bridges and iptables. It turns out that this is rather complex and clear information describing how all this fits together in detail is not readily available. Here, we note what we learnt from this exploration.

Continue reading