On-storage computation for a serverless environment

The serverless architecture is getting a lot of attention and there is a lot of talk going on about it (forbes, gigaom, techbeacon). This new architecture is especially useful for developers since there is no need to worry about deployment or interactions between different servers. The developer only needs to worry about the code, a function. Functions are the way applications are written in this architecture, otherwise known as Function as a Service (FaaS).

It is very clear that object storage could fit the serverless architecture to present a possible solution to the problem of stateful functions. Object storage does not make any distinctions between different file types (e.g.: mp3, txt, jpeg) and saves the data and the metadata of an object separately, this is a very important property of object storage. In OpenStack Swift objects are stored in containers whose storage can grow as needed. This is important for object management and that’s why important for the computation on storage objects.

But how can we bring object storage and FaaS closer so we can begin to explore solving the stateful function problem?

A few years ago Rackspace sponsored a project that would allow you to run code directly on your storage node. This project is known as zerovm, which is a promising concept and very relevant for serverless architectures.

Now what is zerovm? Zerovm is a platform that allows to bring computation to the data and not create traffic and bottlenecks by bringing the data to the compute. It is deployed together with OpenStack Swift on a storage node. The thing that zerovm allows us to do now is that a python script can be written and uploaded to the node, zerovm. This script resembles a function in the serverless pattern quite a lot. This function now sits there and only runs when it is invoked. It can be invoked together with a job-description, this job-description is a json file that specifies parameters, storage-container that should be used in the function when executed.

This is exactly what is looked for in the serverless world, on-storage computation that is directly executed where the data is located without creating an overhead from downloading the data to compute nodes. A function that defines what is to be done with the storage object that are given as input, that only is executed when it is invoked.

Here you can see a very basic example of zerovm. Let’s assume there is a huge logfile stored as an object in a container. We want to go find out how many times the error code “banana” has showed up in this file, so we write a short python script demo.py

i = 0
with open('/dev/input') as fp:
    for line in fp:
        if "banana" in line:
            i = i + 1
print i

This code needs to be packed and uploaded to the containers on which the code should be ran. In this case the container is called demo.
What is /dev/input? This can be specified in the job description, so let’s have a look at the job.json file.

[{
    "name": "zerovmdemo",
    "exec": {
        "path": "file://python2.7:python",
        "args": "demo.py"
    },
    "devices": [
        {"name": "python2.7"},
        {"name": "stdout"},
        {"name": "input", "path": "swift://~/demo/logile.txt"},
        {"name": "image", "path": "swift://~/demo/demo.tar"}
    ]
}]

The function can now be executed with a curl command and return the number of times that “banana” is in the object.

curl -i -X POST -H "Content-Type: application/json" \
-H "X-Auth-Token: $OS_AUTH_TOKEN" -H "X-Zerovm-Execute: 1.0" \
--data-binary @job.json $OS_STORAGE_URL

This is how to deploy and run a function with zerovm, quick and easy.

Stay tuned for more posts on serverless architectures here!

Leave a Reply

Your email address will not be published. Required fields are marked *