Evaluation of AWS “Private Cloud” options for serverless computing

While hybrid, multi- and cross-cloud applications are on the rise, even for scenarios in which purely public cloud deployments are planned, having an equivalent private cloud stack available is useful in many ways. With the relative portability of popular open source cloud stacks, this is rather trivial to accomplish. For many large cloud providers, there are commercial solutions like Microsoft’s Azure Stack, IBM’s Cloud Private, Oracle’s Cloud Native Framework, Google’s Anthos (née CSP), Alibaba’s Apsara Stack and Amazon’s AWS Outposts (as well as Greengrass for Lambda and other specialised offers). Yet sometimes, these are not an option for technical or business reasons. In this blog post, alternative options are discussed.

To run Lambdas and other cloud application logic on private installations, both the Function-as-a-Service (FaaS) and the Backend-as-a-Service (BaaS) side need to be properly established. The FaaS side includes the Lambda runtimes and the deployment of resources according to the Serverless Application Model. The BaaS side includes databases, message queues, API gateways and other popular Lambda-connected backends.

One of the first open source equivalents of the AWS cloud services has been Eucalyptus. It is still a viable option for the “mammoth” services on the infrastructure level (EC2 virtual machines, S3/EBS storage, CloudWatch monitoring) but especially on the serverless side is by itself not sufficient. Lambda-API-compatible function runtimes such as Snafu could fill this gap, but again, cloud development has been fast and crucial features such as SAM support would be lacking.

Instead, another combination appears to be more appropriate: The combination of SAM Local (based on docker-lambda), Localstack and some glue code. SAM Local runs Lambda functions locally in a Docker container and simulates event triggers. Localstack implements mock services across the AWS portfolio. The glue code is necessary to ensure that all tools and outbound network operations are redirected to the local deployment. There are two main strategies for the redirection: code-based and infrastructure-based. In the following, they will be explained for simplified interactive (test) sessions, with slightly different parameters when using them for permanent (production) cases.

Code-based redirections. These work on a programming language or library level and require endpoint modifications to the invocation of tools or functions. A general advantage of code modifications is the portability of the results irrespective of specially crafted infrastructure and networking setups. Modifying any library in particularly has the additional advantage of enforcing endpoints across applications without requiring many code changes, but this may not always be possible, requiring the modification of invocation instances instead. For example, using the AWS CLI in shell scripts would be rewritten as follows, assuming a Lambda-compatible control plane API runs at port 10000:

aws --endpoint-url http://localhost:10000 lambda list-functions

For Python and JavaScript (Node.js) code, the two dominant programming languages for Lambdas, the respective binding libraries (AWS SDKs) provides a similar override, and likewise for other programming languages. In Boto, the Python library, the endpoint is specified as follows:

import boto3
my_client = boto3.client('SERVICE', endpoint_url='http://localhost:10000')

Precise patching would require loading any code files into an abstract syntax tree or JSON tree representation and roundtripping back into code after transforming the tree, e.g. using esprima/escodegen. A more pragmatic approach would be the use of regular expressions for ‘monkey-patching’ any relevant code parts. To give an example, the following slightly heuristic sed expressions perform a non-destructive endpoint URL addition on Python code as shown above:

s/,?\s?endpoint_url\s?=\s?('|\")[^'\"]*('|\")//g
s/(boto3_client([^)]*)/\1, endpoint_url='http://localhost:10000'/g

And the following expression does the same for Node.js code:

s/(new AWS.[^(]*({)/\1endpoint: "http:\/\/localhost:10000",/g

On the SDK level, the modifications are slightly easier. For Python, for instance, the Botocore library provides a DEFAULT_ENDPOINT defined as “{service}.{region}.amazonaws.com” which can be redefined as needed on the code level (in the botocore file client.py) or through dynamic programming.

A subtype of code-based redirections are those involving a proxy server to whose endpoint (e.g. on localhost) is used in the code. Again, portability and unprivileged operations are the advantages. Many proxy server designs would be possible, similar to e.g. morebalance on protocol-agnostic content, and based on haproxy for HTTP. A dedicated proxy server such as localstack-single-endpoint acts as endpoint concentrator and multiplexes requests based on AWS-specific header content. A disadvantage of all proxy concepts is the loss of scalability, as all requests now need to traverse through the proxy, and the need to operate one more piece of software as part of the stack.

Infrastructure-based redirections. These require modifications to the name servers using DNS redirections, and are therefore disadvantaged by requiring administrative privileges which may not always be available. There are two subtypes, isolated and globally connected setups. In isolated setups, a local DNS server redirects all requests to a specified host, whereas in globally connected setups, only specific domains (e.g. *.amazonaws.com) are redirected and the others are passed through to an authoritative nameserver. This may be necessary, for instance, if a Lambda function retrieves information from an arbitrary Internet host or external web service. In both cases, the local cloud hosts require the new DNS to be entered as first or only entry into the /etc/resolve.conf file, as follows (here assuming a simplified single-host setup and other local DNS, such as systemd-resolved, not running concurrently):

echo "nameserver 127.0.0.1" > resolv.tmp$$
cat /etc/resolv.conf >> resolv.tmp$$
sudo mv resolv.tmp$$ /etc/resolv.conf

In the isolated setup, the netwox tool can be used to generate a constant DNS reply always pointing to the local cloud services (again assuming localhost for simplification, and requiring a multiplexing proxy server in addition).

sudo netwox 104 -h router.cloud -H 127.0.0.1 -a dns.cloud -A 127.0.0.1

In the globally connected setup, an (undocumented) feature of dnsmasq can be used. This tool allows using wildcards not only to return static addresses but also to forward requests to an explicitly specified upstream server.

sudo dnsmasq --no-hosts --no-daemon --address=/lambda.us-east-1.amazon.com/127.0.0.1 --server=/#/UPSTREAMDNSIP

Using dnsmasq allows for specifying alternative ports (IP#port) for –server but obviously not for –address which may however be required in single-host setups as is assumed to be the default by Localstack. To support different port assignments, three choices exist. First, a multiplexing proxy server could be used which differentiates based on HTTP headers as described above. Second, CNAMEs could be used in conjunction with virtual host-based port forwarding (e.g. using httpd’s ProxyPass statements), however the CNAME support in dnsmasq is rather limited. The third alternative would be virtual IP addresses in conjunction with static port forwarding via iptables, eliminating the need for user-space tools but requiring a more sophisticated system setup. The last alternative works today, does not lead to bottlenecks, and does not require further active system components.

Performance-wise, a back-of-the-envelope comparison with a quick and dirty experiment reveals that with both tools, a host query requires on average 7.7 milliseconds, not revealing any significant disadvantage of using the substring-matching dnsmasq tool over the constant-reply netwox equivalent.

The key ingredient of the virtual IP address setup is to assign one per Localstack service, with IP addresses roughly matching the Localstack mock service port numbers to avoid any confusion during debugging. For each service, all regions (and the region-less endpoint) need to be set up. The main instructions to realise this setup are given here:

# e.g. port=4574 for Localstack's AWS Lambda mock service
ip=10.10.0.$(($port-4500))
sudo ip addr add $ip/32 dev lo
sudo iptables -t nat -A OUTPUT -o lo -p tcp -m tcp -d $ip --dport 443 -j DNAT --to-destination 127.0.0.1:$port
sudo iptables -t nat -A POSTROUTING -o lo -m addrtype --src-type LOCAL --dst-type UNICAST -j MASQUERADE

In this setup, any request to the virtual IP address 10.10.0.74:443 is routed within the local network device (lo) to 127.0.0.1:4574, and together with dnsmasq on the front and Localstack on the back, commands such as ‘aws lambda list-functions’ will interact with the mock Lambda service without requiring further configuration.

In our research, we are now using one such emulation environment to evaluate the behaviour of cloud services and applications which build upon AWS services, and will report about the results in the future.


Leave a Reply

Your email address will not be published. Required fields are marked *