Investigation of Self-Management for Flask-based Services

Self-management is an important property of software services to increase the degree of exploiting benefitial characteristics of underlying runtime systems. Whether such services run in a managed cloud environment, on a device or somewhere else in the computing continuum, there may always be limitations in the managing runtime platform that a complementary or overarching application-level management can help to overcome. Using a Python Flask-based web service as example, this research blog post informs about our ongoing investigations into two specific self-management aspects: runtime resilience and feistiness.

We have previously – five years ago – reported on self-management in cloud-native applications concerning resilience and elastic scalability, and moreover – two years ago – on the need for adaptive self-management in clouds. Since then, the proliferation of computing paradigms has evolved significantly. Multi- and cross-cloud setups have become common practice, serverless computing is now an indispensable ingredient of modern economic application design, and cloud-fog continuums are becoming more widespread while advanced concepts such as osmotic computing are emerging as well. These evolving paradigms have the potential to make application and service engineering easier across all domains, but to exploit this potential, the self-management aspects need to be considered by design. This can be discussed on an abstract level, although for better and more intuitive understanding, this text exemplifies these aspects with a very ordinary Flask-based application. Flask is a typical Python web service framework that uses declarative routes (specified with Python decorators) to map local functions to HTTP endpoints. Flask comes with its own minimal HTTP server, but can also be proxied by more capable servers such as Apache httpd. For rapid prototyping, a complex setup is however prohibitively expensive in terms of effort. Either a managed (PaaS-based) solution or a somewhat capable zero-configuration self-hosting are therefore alternatives. Given the limited capabilities of PaaS even now – after ten years of being on the market – this presents interesting opportunities for self-management. To keep the expenses in check, we define the requirement of achieving the self-management aspects without any modification to the original Flask code.

Application and Service Model

The structure of a Flask application is shown below. It consists of a number of functions (or methods) that are exposed to the outside through an HTTP interface. RESTful service endpoints are defined by the port the service is running on in conjunction with declarative route decorators that may contain light-weight validation information such as type specifications. When the application runs with the built-in web server (others are possible following WSGI), it logs HTTP responses, but also exceptions associated with HTTP status code 500, to standard output. While this text is specific to Flask, many application service frameworks for many programming languages have a quite similar model.

import json
from flask import Flask
app = Flask(__name__)

@app.route('/')
def overview():
    return json.dumps({"status": "ok"})

@app.route('/user/<username>')
def getuser(user):
    raise Exception("user unknown")

@app.route('/<int:year>')
def getyear(year):
    return str(year)

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8080)

Resilience

A resilient application continues to run, or degrades gracefully and in a controlled manner, when the runtime environment is affected by failures. Most cloud platforms handle certain failure types well, especially crashes that are mitigated by restarts. One of the trickier failure classes is slowness, including its extreme variant, a hanging process. Following the cattle-instead-of-pets design of microservices, resilience of an existing Flask application can be achieved by wrapping it into a slowness/crash detection supervisor script. For Flask applications, this is implemented in the “flasksupervisor” script that is prepared to run both as root and as unprivileged users, depending on the ability to incorporate its invocation into a system startup process (e.g. through a systemd unit file that is accompanying the script as template).

Feistiness

Feistiness refers to the ability of an application to fend off malicious invocations. For a web service, specifically, it means that an application is able to impose connection limits by itself, in terms of blocking IP addresses and host names, throttling connections or prioritising peers. Again, the detection of adversaries and their blocking can be implemented without any modification to the application by wrapping it into a script that parses the Flask code, extracts knowledge about permissible HTTP endpoints, spawns the code and registers unexpected URLs by tracking the HTTP log of the spawned code after proxying all requests. This is implemented in the “flask2ban” script.

This is preliminary, code-driven research that comes without a quantitative evaluation at this point in time. We have however identified a number of use cases especially for business applications, and will work with industry partners to outline with empirical techniques the value in application self-management. In the meantime, we hope that these code examples also inspire thinking about application management choices and the resulting self-management aspects.


Leave a Reply

Your email address will not be published. Required fields are marked *