The discourse on cloud functions focuses heavily on diverse use cases: Standalone functions to perform a certain functionality, compositions of functions into complete applications, and functions as plumbing between separate application parts. This blog post intends to explore the use of cloud functions as extensibility mechanism for existing applications. It exemplifies the interaction between a function, a website and a login-protected web application and furthermore discusses implementation aspects and the notion of caching data in function instances.
To demonstrate the functionality, OpenFlights is used as free-of-charge web application which allows for managing and visualising past flights between airports. Notwithstanding concerns about environmental impact, which most likely can only resolved by increased research of alternative flight techniques such as in the Swiss Solar Impulse case, flying is fun and tracking one’s flights is a great digital companion to that. The screenshot below gives a first visual impression of OpenFlights.
While OpenFlights does calculate the total distance flown, this result shall be validated by a cloud-hosted add-on function which repeats the calculation step by step. The function would be executed on demand, with high elastic scalability if needed, and with a resource-proportional per-call cost which for low-volume applications offers clear cost benefits over constantly running services. Furthermore, it would be terminated in case of malbehaviour, including deadlocks and endless loops, which aids in providing zero-administration, zero-headache service offerings.
The challenges while constructing the function include combining several datasets, ensuring a low function execution duration, handling authorisation workflows, and eventually getting the math right. The following diagram visualises the intended function. Upon receiving user credentials, it shall authenticate against OpenFlights to eventually download the personal list of flights, as well as a public airport database from another website. Subsequently, it should correlate the data, filtering out unsuitable entries as necessary, and return the resulting distance along with a confidence metric which equals the coverage of flights involving valid airport entries.
- Dataset combination: In order to get precise airport positions, the Global Airport Database maintained by Arash Partow is consulted which offers a downloadable zipped CSV file. Due to the ZIP format requiring random access (seeks) and Python’s urllib module not supporting seeks, a temporary download is necessary although perhaps StringIO might be an alternative. While large, several entries are missing, some are duplicated, and some are given with default longitude and latitude coordinates (0°0’0”). Hence, all personal flights which reference such a tainted airport need to be discarded. As the database accepts improvements, we will consider sending in additions.
- Function execution time: Cloud functions are known to be notoriously prone to prolonged warm-up periods. In order to cut down on at least warm function execution, datasets should be cached according to their estimated change frequency. In the given implementation, the airport database is cached for 5 minutes while the personal flights are not cached. The maximum cache time would be the survival time of the function instance which is often around ten minutes, followed by another coldstart. Another path towards lower execution times is the use of multithreading while primarily I/O operations (i.e. network transfers, disk reads) take place. This is not implemented yet, however.
- Getting the math right: We just got it right. If you find any issues, please report.
The next screenshot shows an excerpt of the function execution in the AWS Lambda web console. Clearly recognisable are the return value represented as list with two values and the log output mainly used for debugging. Not visible in the screenshot is the execution time which is around 1.2-1.4 seconds with a warm 128 MB Lambda instance but up to 2.0 seconds with a cold instance due to the coupled container instance start-up time and the dataset download and uncompression time.
The execution time behaviour of the function is shown in the diagram below. Sequences of ten executions in intervals of ten seconds are executed with increasing waiting times (factors of 50s) over time. The higher execution time for the first invocation of each sequence becomes apparent, but is not absolute compared to the general variability in execution times. Hence, for practical purposes, the coldstart issue is not dominant for this specific application.
The function implementation can be retrieved from the Service Prototyping Lab Git repository. It can be deployed as portable function with multiple entrypoints into AWS Lambda and IBM Cloud Functions, and additionally it can be executed locally through the Python interpreter on the command line for testing purposes.
Leave a Reply