Insights into AWS SAR

The Serverless Application Repository by Amazon Web Services (AWS SAR) is, in simplified terms, a marketplace for Lambda functions. You can speed up application development by building on the functions (or function compositions) provided by it, and you can share your own functions with other cloud application developers. AWS SAR was launched over a year ago. In the Service Prototyping Lab at Zurich University of Applied Sciences, we are investigating better ways of building applications for cloud and post-cloud environments. Consequently, we did a full year observation of AWS SAR to find out what’s in it and what’s going on. Read on for some interesting excerpts and findings and for accessing the study document.

With more than 500 serverless applications, defined as either a Lambda function or a combination of such functions and backend services, AWS SAR is big enough to justify an investigation, and still small enough to notice all trends and any shift in how it is used by developers. In a nutshell, the figure below shows on the left scale how both present vendors and available cloud functions have grown over time. (Please ignore the sudden jump at the end of April 2019 – it merely corrected that we forgot to include cloud functions with custom IAM requirements, and all historic data on them before that point is lost – nostra culpa!) On the right scale, the growing number of deployments is shown. (Please also ignore that the graphs, just like some others, only starts in July instead of February 2018. The data before that were not precise enough – nostra culpa again! Research ain’t easy, sometimes.)

Interestingly, there have not been many new functions since around the end of January 2019 (growth <1%) – while deployments are going linearly strong as ever (growth ~7%). Is SAR already a legacy system, or are other changes ahead? As we continue to observe the development, we hope to give better answers to this question, in particular given that AWS is the only vendor operating a marketplace right now (although we had a joint research prototype before them).

One of the interesting questions is: Which type of serverless application is published on the repository? The following table gives an overview in terms of the number of occurrences of each type with more than five representatives. It is clear that “pure” Lambda functions have the absolute majority. This means that the Serverless Application Model (SAM), which defines complex deployments of cloud functions including backend resources, is not yet fully exploited by most repository entries. Two thirds of serverless applications could be expressed by a simple Lambda+S3+API-Gateway syntax, but the long tail of various combinations (totalling a considerable 21%) presumably justifies the more complex SAM syntax.

Type of serverless application	Percentage
Just Lambda function	53
Lambda + S3	8
Lambda + API-Gateway	5
Lambda + SNS	4
Lambda + SimpleTable (DynamoDB abstraction)	3
Lambda + DynamoDB	2
Lambda + permissions	2
Lambda + Kinesis	1
Lambda + IAM	1

Most programmers want to see a statistic on their favourite language. The dominance of just two languages over all others is crystal-clear when consulting the following table. As there are multiple statistics (deviating e.g. by the number of source files and absence of code repository information for some functions), we give all the numbers but consider the last column the most precise one (according to function runtime).

Language	% acc. GitHub metadata	% acc. GitHub repos	% acc. SAM
JavaScript (node.js)	30	50	52
Python (2 & 3)	56	44	37
Go	3	2	3
Java	3	2	2
all others	8	2	6

On the critical remarks side, we noticed several quality issues with AWS SAR, most of which could be prevented by a more stringent publishing procedure and a tightened data (entity-relationship) model. We will get in touch with AWS and other cloud providers to verify our findings and to see what the plans are for expanding the serverless ecosystem through and beyond such repositories. With commodity players such as GitHub entering the artefact repository space, quality and trust are becoming the two important differentiators for developers (unconfirmed hypothesis).

The analysis contains further metrics on metadata (e.g. dominant or trending vendors, choice of licences), on the code repositories (e.g. duplicity, forks), and on required runtime capabilities, among others.

The complete AWS SAR study document can be accessed from arXiv:1905.04800. If you enjoy the reading, we would like to point out a previous ecosystem study for Helm charts, and our ongoing work on Kubernetes Operators and similar software artefact ecosystems. Related to FaaS, we also point out the new insightful runtime analysis by Epsagon and our previous (joint) empirical study on how serverless development works in practice.

Schlagwörter: aws, datascience, evolution, faas, maomao, marketplace, paper, serverless

Leave a Reply Cancel reply