The discourse on cloud functions focuses heavily on diverse use cases: Standalone functions to perform a certain functionality, compositions of functions into complete applications, and functions as plumbing between separate application parts. This blog post intends to explore the use of cloud functions as extensibility mechanism for existing applications. It exemplifies the interaction between a function, a website and a login-protected web application and furthermore discusses implementation aspects and the notion of caching data in function instances.
To demonstrate the functionality, OpenFlights is used as free-of-charge web application which allows for managing and visualising past flights between airports. Notwithstanding concerns about environmental impact, which most likely can only resolved by increased research of alternative flight techniques such as in the Swiss Solar Impulse case, flying is fun and tracking one’s flights is a great digital companion to that. The screenshot below gives a first visual impression of OpenFlights.
While OpenFlights does calculate the total distance flown, this result shall be validated by a cloud-hosted add-on function which repeats the calculation step by step. The function would be executed on demand, with high elastic scalability if needed, and with a resource-proportional per-call cost which for low-volume applications offers clear cost benefits over constantly running services. Furthermore, it would be terminated in case of malbehaviour, including deadlocks and endless loops, which aids in providing zero-administration, zero-headache service offerings.
The challenges while constructing the function include combining several datasets, ensuring a low function execution duration, handling authorisation workflows, and eventually getting the math right. The following diagram visualises the intended function. Upon receiving user credentials, it shall authenticate against OpenFlights to eventually download the personal list of flights, as well as a public airport database from another website. Subsequently, it should correlate the data, filtering out unsuitable entries as necessary, and return the resulting distance along with a confidence metric which equals the coverage of flights involving valid airport entries.
Cloud functions can be implemented in multiple languages. While most providers support JavaScript, multiple also support Python which has been chosen as implementation language in this case. To tackle the mentioned four challenges, the following design and implementation decisions have been taken.
- Dataset combination: In order to get precise airport positions, the Global Airport Database maintained by Arash Partow is consulted which offers a downloadable zipped CSV file. Due to the ZIP format requiring random access (seeks) and Python’s urllib module not supporting seeks, a temporary download is necessary although perhaps StringIO might be an alternative. While large, several entries are missing, some are duplicated, and some are given with default longitude and latitude coordinates (0°0’0”). Hence, all personal flights which reference such a tainted airport need to be discarded. As the database accepts improvements, we will consider sending in additions.
- Function execution time: Cloud functions are known to be notoriously prone to prolonged warm-up periods. In order to cut down on at least warm function execution, datasets should be cached according to their estimated change frequency. In the given implementation, the airport database is cached for 5 minutes while the personal flights are not cached. The maximum cache time would be the survival time of the function instance which is often around ten minutes, followed by another coldstart. Another path towards lower execution times is the use of multithreading while primarily I/O operations (i.e. network transfers, disk reads) take place. This is not implemented yet, however.
- Authorisation workflow: OpenFlights is implemented in PHP and uses the PHP session id as cookie on the HTTP level to identify sessions. Within each session, a challenge-response authentication needs to be performed before the personal flights can be downloaded. The function implementation thus makes use of the Python HTTP Cookie Jar. The authentication logic is not documented and was found out by reverse engineering the HTTP interaction by inspecting JavaScript code (settings.js, openflights.js) and using the browser’s JavaScript console. It consists of a double-hashed credential involving username, password and challenge string which is transmitted as single line of HTTP POST parameters.
- Getting the math right: We just got it right. If you find any issues, please report.
The next screenshot shows an excerpt of the function execution in the AWS Lambda web console. Clearly recognisable are the return value represented as list with two values and the log output mainly used for debugging. Not visible in the screenshot is the execution time which is around 1.2-1.4 seconds with a warm 128 MB Lambda instance but up to 2.0 seconds with a cold instance due to the coupled container instance start-up time and the dataset download and uncompression time.
As applied university laboratory with strong ties and transfer into education, the resulting function implementation offers didactic value to combine knowledge on programming, on Internet protocols and on service prototyping. It is indeed used as a showcase example towards the end of a first-semester lecture on Python programming. Further into the future, the loop back into the web applications can be closed by hooking into the cloud functions with JavaScript or server-side page generation code, leading to more scalable websites and more fine-grained microservices bundled into web applications.
The execution time behaviour of the function is shown in the diagram below. Sequences of ten executions in intervals of ten seconds are executed with increasing waiting times (factors of 50s) over time. The higher execution time for the first invocation of each sequence becomes apparent, but is not absolute compared to the general variability in execution times. Hence, for practical purposes, the coldstart issue is not dominant for this specific application.
The function implementation can be retrieved from the Service Prototyping Lab Git repository. It can be deployed as portable function with multiple entrypoints into AWS Lambda and IBM Cloud Functions, and additionally it can be executed locally through the Python interpreter on the command line for testing purposes.