Processing IoT Sensor Data with Semi-Stateful Cloud Functions

At Zurich University of Applied Sciences, we are currently building a test track to link applied teaching with research and innovation. Such a facility allows for covering a whole range of topics: programming, autonomous driving, robotics, cloud, serverless, continuums, sensing, open data, data science, and various computing paradigms. We expect a video to be available around November that explains the facility and especially the teaching element. In this research blog post, we already report on interesting observations around the uplink between sensors and FaaS. We expect these insights to bring benefits to companies building IoT-cloud integrations.

The (simplified) architecture of the system is shown below. Various types of sensors report events to cloud functions. These functions are typically considered stateless, although in practice they are never completely (see older related post on building probabilistically stateful applications). To manage state over a longer period of time, an in-memory database is attached to the function; for security reasons, the access is controlled by a restricted virtual network. As we are still covered by GCP Research Credits for Serverless Data Integration, we are basing this work on the respective GCP offerings: GCF, VPC Connector, and the Redis Memorystore.

The ‘hit’ function will for instance increase a counter anytime a car passes the corresponding sensor. It reads the Redis credentials from environment variables. Furthermore, it reads a secret from the environment too, with a fallback to a variable definition in a secrets file. This is a pragmatic way to protect against damaging unauthorised function calls: Instead of requiring authentication at the invocation level, the function is open for public invocation but any actual processing, for example writing values to Redis, is protected. We regard low-effort starting with technologies important for prototypical research and teaching, and for this reason also disregarded GCP’s IoT Core that came with heavy double authentication (keys + JWT) requirements that are redirecting too much effort into security engineering rather than freeing the effort up for advances in technology. (Of course we assume and demand that anybody who operates such solutions in production would then take care of appropriate security.)

Once the described system was up and working, the research question emerged: Is Redis really needed? Could we not just store the counter variable as global cloud function variable, hence letting it survive requests?

There are two potential problems that may interfere with that idea. Function instances (the containers or equivalent isolation units beneath FaaS) are kept around for some time, but not forever. Specifically, an invocation will get a new instance if:

  • Parallelism occurs. In this case, a second instance is temporarily opened. Once the parallelism stops, the older instance takes precedence and the second instance is no longer used. Hence, a typical counting sequence looks like 124, 125, 1, 126, 127.
  • Recycling occurs. In this case, the original instance is permanently replaced with a new one after an idle period. This problem is potentially even more complex (a whole host VM with multiple containers may migrate or be replaced); for simplicity we are not distinguishing the reasons why a function instance goes away, but we want to know how likely it is. A typical counting sequence looks like 130, 131, 132, 1, 2, 3, 4.

Research on function instance recycling has led to interesting insights. In SLD#97, Manner et al. from University of Bamberg say that “containers on most platforms were shut down after 20 minutes of idling”. In SLD#37, Lloyd et al. from University of Washington state that all AWS containers were recycled after 45′. Numbers for GCF were not specifically reported by both, and also not in our own previous work where we observed that IBM Cloud Functions get recycled after about 10′. In another work (citation currently missing), van Eyk et al. presented a rough estimation about ~6 hours recycling time on GCF, which would be long enough to start codifying it into optimised processing logic.

With our own preliminary observations, we can confirm that. Several invocations with long 60-90′ (and even 240′) intervals in between led to no reset of the counter. This means that in controlled single tenant environments, like a function exclusively served by a single sensor or by multiple sensors when concurrency can be avoided (e.g. through an ordered queue in front of the function), using global state variables, from simple counters to complex aggregated statistics, is a viable, and highly economical, alternative to using external databases. We anticipate that advanced FaaSification frameworks allow for simply marking variables to be stateful, and then a code generator takes care of either safely using a backend store or juggling the data between global variables and occasional backend snapshots.


Leave a Reply

Your email address will not be published. Required fields are marked *