It is well recognized that GPUs can greatly outperform standard CPUs for certain types of work – typically those which can be decomposed into many basic computations which can be parallelized; matrix operations are the classical example. However, GPUs have evolved primarily in the context of the quite independent video subsystem and even there, the key driver has been support for advanced graphics and gaming. Consequently, they have not been architected to support diverse applications within the cloud. In this blog post we comment on the state of the art regarding GPU support in the cloud.

Public cloud service offerings with GPU supports

Amazon was the first to major provider to offer GPU capabilities in their cloud offering. They launched their cg1 instances in late 2010 which were subsequently upgraded to the g2 instances in late 2013; the g2 instances deliver an entire Nvidia GPU (based on Nvidia Kepler systems) to the user. However, even this most basic solution has exhibited significant and unexpected performance problems if not configured correctly (netflix reported on the impact of libraries and operating system configuration parameters here; there is an interesting thread on reddit highlighting performance issues here).

The other large cloud provider to make a GPU-based offering is IBM through Softlayer: very recently, they started to offer GPU support. As with Amazon, IBM’s offering is one in which the whole GPU is dedicated to the user.

There are, of course, other smaller providers who offer cloud based GPU access (Nvidia has a list of partners here) but we do not consider them further here; also, other cloud providers are certainly working on offering GPU capabilities with the increasing need for them due to larger trends including big data analysis and energy efficiency.

GPUs and the cloud model

The current public cloud GPU offerings focus on delivering dedicated GPUs. While this is a pragmatic solution and meets a real demand, it is not entirely consistent with the cloud model which focuses on flexibility, elasticity and resource sharing. However, delivering a more flexible, shared GPU model requires more sophisticated thinking regarding how applications interact with GPUs and the role of the Operating Systems and virtualization layers in these interactions. Fundamentally, this requires an appropriate model of a GPU which can be managed by a hypervisor and the appropriate hypervisor intelligence to ensure that the GPUs get the right data at the right time and the resource can be shared appropriately amongst the system users.

This, of course, is not a new idea. There have been a number of interesting contributions to the research literature which have focused on aspects of this problem. One interesting contributions is GDev (link to code, link to paper) which focuses on exactly offering an Operating System interface to GPU resources which can be accessed by both applications as well as the OS itself. In this contribution, access to the GPU is controlled via the Operating System and the GPU dynamic and static details appear in /proc. The Operating System also has support for basic memory synchronization between CPU and GPU as well as GPU job scheduling. Lastly, it introduces an interesting initial notion of a virtual GPU. A later contribution, GDM, (link to paper) which builds on GDev focuses more specifically on the problem of memory management on the GPU itself and in particular, enabling the Operating System to manage this critical resource: as this resource is so important but behind a bottleneck – typically a PCIe bus – applications often want to consume it all which results in poor performance in a shared context. The basic mechanism defined in GDM is a so-called GPU staging area which is in the application memory, but specifically flagged to the OS as memory that needs to be shared with the GPU: the OS copies this memory to the GPU without requiring any application intervention. Another very interesting contribution, which focuses on another key capability of Operating Systems, enables GPUs to have access to the filesystem (link to paper). This employs similar ideas to the GDM work to enable GPU-based applications to have access to the filesystem, a fundamental component of which is a filesystem buffer which is partially replicated across the CPU and GPU memories. They provide a library for the GPU application developer which can co-ordinate with the host (CPU-based) OS to provide file access.

The research literature contains many more contributions focusing on different aspects of these problems: this is not intended to be a comprehensive survey, rather a short comment on the current state of the art and the above have been chosen as examples of important building blocks (coming from the academic and open source community) necessary to make the GPU accessible in a cloud computing context.

The industry is not oblivious to these issues, of course. Nvidia has done considerable work on promoting its vision of virtualized GPUs. To date, this has focused more on the supporting virtualized desktops and users of high-performance enterprise applications in-house: to this end, they are partnering with VMware and Xen to deliver solutions to provide DirectX and OpenGL video supports. Providing more general support for applications requiring programmable access to virtualized GPU resources is not yet possible.

What else needs to be done?

Although the above work points in a compelling direction, there is much work to be done before the full power of GPUs can be leveraged in the cloud. In the open source world, changes to linux kernels and hypervisors are required which can take considerable time. Further, the above proposals, though interesting and useful contributions per se, do not offer a full vision of the interactions between GPUs, CPUs and Operating Systems particularly in a cloud context and are more akin to libraries used to support applications prior to proper OS support for segregated memory (yucky things like this!) and process management. The closed source world – being closer to the vendors – will be in a better position to develop specific solutions due to better access to tools and information; however, this does not preclude interesting contributions from the open source world.

Hardware systems will continue to evolve and the diversity in systems that will need to be supported will increase, eg GPUs with more sophisticated memory and ‘process’ management supports, alternative memory architectures which reduce bottlenecks between CPU and GPU memories, support for GPUs outside the Nvidia/CUDA ecosystem, increased importance of container solutions etc; this leads to greater challenges for maximizing application performance over heterogeneous resources, not to mention the whole issue of how to write or adapt applications for such heterogeneous platforms.

Fun times ahead!