As we have recently been granted Google Cloud Research Credits for the investigation of Serverless Data Integration, we continue our exploration of open and public data. This HOWTO-style blog post presents the application domain of financial analytics and explains how to run a cloud function to achieve elastically scalable analytics. Although there are no research results to report yet, it raises a couple of interesting challenges that we or other computer scientists should work on in the future.
Self-management is an important property of software services to increase the degree of exploiting benefitial characteristics of underlying runtime systems. Whether such services run in a managed cloud environment, on a device or somewhere else in the computing continuum, there may always be limitations in the managing runtime platform that a complementary or overarching application-level management can help to overcome. Using a Python Flask-based web service as example, this research blog post informs about our ongoing investigations into two specific self-management aspects: runtime resilience and feistiness.
As presented in a prior post, Singer.io is a modern, open-source ETL (Extract, Transform and Load) framework for integrating data from various sources, including online datasets and services, with a focus on being simple and light-weight. The basics of the framework were explored in our last post on the topic, so we will refer you to that if you are unfamiliar.
This post is about our process for deploying Singer to the cloud, more specifically, to the Cloud Foundry open source cloud application platform. This was done in the context of researching the maturity of data transformation tools in a cloud-native environment. We will explore the options for deploying Singer taps and targets to a cloud provider and discuss our implementation and deployment process in detail.
Many providers of hosted services, including cloud applications, are subject to a contradiction in handling log data. On the one hand, storing logs consumes resources and should be minimised or avoided altogether to save resource cost. On the other hand, regulatory constraints such as keeping the data for the purpose of future audits exist. A smart solution to encode the data appropriately needs to be found. The coding encompasses both compression, to keep resource use low, and encryption, to prevent leaking information to unauthorised parties, for instance when logging for the purpose of intrusion detection. On an algorithmic level, the encoded data should still be usable for computation, in particular comparison and search. In this blog post, based on the didactic log example shown in the figure below, we present algorithms and architectures to handle cloud log files in a smart way.
In Switzerland, opendata.swiss is the go-to location for any open dataset resulting from federal, cantonal or municipal sources. From a societal and economics perspective, the portal is an important asset following the “protect private data, make use of public data” mantra, and has already led to digital innovation through the availability of many third-party applications. In this research blog post, we look at some numbers associated with the portal.
Singer.io is an open-source JSON-based data shifting (ETL: extract, transform, load) framework, designed to bring simplicity when moving data between a source and a destination service on the Internet. In this post, we present the framework as entry point into the world of SaaS-level data exchange and some associated research questions.
Docker images have become the valuta franca in the cloud and container platform world. Although on the path to vendor-neutral standardisation (e.g. with OCI also being in Docker Hub for a year now), developers for now have settled on plain Docker as de-facto standard due to the vast ecosystem of base images and dependency images which speed up the rapid prototyping of complex scalable applications. From a production-grade DevOps perspective, a key concern is then to be assured that the containers used are of high quality, not infected by security vulnerabilities, and still containing the latest features available. In this blog post, a novel approach to visualise the situation around a particular container image is presented.
The MAO-MAO research collaboration aims to provide metrics, analytics and quality control for microservice artefacts of all kinds, including but not limited to, Docker containers, Helm charts and AWS Lambda functions. As such, an integral part of prior research has been the various periodic data collection experiments, gathering metadata and conducting automatic code analysis.
However, the ambition of the project to collect data consistently, combined with the need for the collaborators to be able to use each other’s tools and access each other’s data, have created a need for a collaboration framework and distributed execution platform.
In response to this need, we present the first release of the MAO Orchestrator, a tool designed to run these experiments in a smart way and on a schedule, within a federated cluster across research sites. As a plus, there is nothing implementation-wise tying it to the existing assessment tools, so it is reusable for any use-case that requires collaboratively running periodic experiments.