Author: milt

Testing Alluxio for Memory Speed Computation on Ceph Objects

In a previous blog post, we showed how “bringing the code to the data” can highly improve computation performance through the active storage (also known as computational storage) concept. In our journey in investigating how to best make computation and storage ecosystems interact, in this blog post we analyze a somehow opposite approach of “bringing the data close to the code“. What the two approaches have in common is the possibility to exploit data locality moving away in both cases from the complete disaggregation of computation and storage.

The approach in focus for this blog post, is at the basis of the Alluxio project, which in short is a memory speed distributed storage system. Alluxio enables data analytics workloads to access various storage systems and accelerate data-intensive applications. It manages data in-memory and optionally on secondary storage tiers, such as cheaper SSDs and HDDs, for additional capacity. It achieves high read and write throughput unifying data access to multiple underlying storage systems reducing data duplication among computation workloads. Alluxio lies between computation frameworks or jobs, such as Apache Spark, Apache MapReduce, or Apache Flink, and various kinds of storage systems, such as Amazon S3, OpenStack Swift, GlusterFS, HDFS or Ceph. Data is available locally for repeated accesses to all users of the compute cluster regardless of the compute engine used avoiding redundant copies of data to be present in memory and driving down capacity requirements and thereby costs.

For more details on the components, the architecture and other features please visit the Alluxio homepage. In the rest of the blog post we will present our experience in integrating Alluxio on our Ceph cluster and use a Spark application to demonstrate the obtained performance improvement (the reference analysis and testing we aimed to reproduce can be found here).

The framework used for testing

Fig. 1: Alluxio testing set-up.
Continue reading

Experimenting on Ceph Object Classes for Active Storage

What is active storage about?

In most of the distributed storage systems, the data nodes are decoupled from compute nodes. Disaggregation of storage from the compute servers is motivated by an improved efficiency of storage utilization and a better and mutually independent scalability of computation and storage.

While the above consideration is indisputable, several situations exist where moving computation close to the data brings important benefits. In particular, whenever the stored data is to be processed for analytics purposes, all the data needs to be moved from the storage to the compute cluster (consuming network bandwidth). After some analytics on the data, in most cases the results need to go back to the storage. Another important observation is that large amounts of resources (CPU and memory) are available in the storage infrastructure which usually remain underutilized. Active storage is a research area that studies the effects of moving computation close to data and analyzes the fields of application where data locality actually introduces benefits. In short, active storage allows to run computation tasks where the data is, leveraging storage nodes’ underutilized resources, reducing data movement between storage and compute clusters.

There are many active storage frameworks in the research community. One example of active storage is is the OpenStack Storlets framework, developed by IBM and integrated within OpenStack Swift deployments. IOStack is European funded project, that builds around this concept for object storage. Another example is ZeroVM, which allows developers to push their application to their data instead of having to pull their data to their application.

So, what about Ceph?

Continue reading

Our recent paper on Cloud Native Storage presented at EuCNC 2019

In June we could participe to the 28th edition of EuCNC, an international conference sponsored by the IEEE Communications Society, the European Association for Signal Processing, and supported by the European Commission. EuCNC is one of the most prominent communications and networking conferences in Europe, which efficiently brings together cutting-edge research and world-renown industries and businesses.

Valencia Congress Center
Continue reading

Running the ICCLab ROS Kinetic environment on your own laptop

As we are making progress on the development of robotic applications in our lab, we experience benefits from providing an easy-to-deploy common ROS Kinetic environment for our developers so that there is no initial setup time needed before starting working on the real code. At the same time, any interested users that would like to test and navigate our code implementations could do this with a few commands. One git clone command is now enough to download our up-to-date repository to your local computer and run our ROS kinetic environment including a workspace with the current ROS projects.

To reach this goal we created a container that includes the ROS Kinetic distribution, all needed dependencies and software packages needed for our projects. No additional installation or configuration steps are needed before testing our applications. The git repository of reference can be found at this link: https://github.com/icclab/rosdocked-irlab

Continue reading

SC2 2018 – The 8th IEEE International Symposium on Cloud and Services Computing

The 8th IEEE International Symposium on Cloud and Services Computing (IEEE SC2) 2018, took place in Paris, France, from the 19th to 22nd of November. The conference was co-located with two more events, namely the 5th International Conference on Internet of Vehicles (IOV) 2018 and the 11th IEEE International Conference on Service-Oriented Computing and Applications (IEEE SOCA) 2018.

We from the ICCLab participated for first three days of the SC2 conference as its focus on Cloud related topics meets our expertise and research interests for research and development activities. The main themes in focus were Cloud Platforms and Services, Networking and Services, and Cloud and SOA Services.

As an important venue for researchers and industry practitioners, SC2 offered the opportunity to exchange information about recent  advancements for IT-driven cloud computing technologies and services. The conference hosted a good number of participants for a familiar context were interacting with peers was easy, not in the last place over the coffee and lunch breaks.  In total 68 oral presentations were planned in 15 sessions. Additionally 10 posters were presented in a poster session and 3 keynotes were organized. The conference was organized over three days, with the first day being dedicated to tutorials and the next days with parallel sessions for each of the co-located conferences.

Besides attending the event itself, the main motivation to visit the SC2 conference was to present our paper entitled “Hera Object Storage: A seamless, Automated Multi-Tiering Solution on Top of Openstack Swift”. In the presented paper we highlighted some of our recent results from research in the field of Cloud storage. In particular, the focus of the contribution is in the fast-growing field  of unstructured data storage in distributed cloud.  We proposed an object storage solution built on top of OpenStack Swift. This solution is able to apply a multi-tiering storage to unstructured data in a seamless and automatic manner. The object storage decisions are taken based on the data temperature, in terms of current access rate.

The first day of the conference, was the day of my arrival. The first tutorial I could attend gave  insights in the NVIDIA company and their activities in the automotive industry. Various interesting results were presented, supported by real-world test videos. We could see how NVIDIA as a market leader supports manufacturers in building self-driving cars. We could appreciate how a full range of real-world conditions influencing the traffic conditions could be handled. The amount of work behind these results was probably not completely clear to many of us, but the needed hardware and software infrastructure was clearly huge!

The second very interesting talk was showing an interesting business model presented by Qarnot computing, France.  The model they presented promoted a solution where computing and heating are delivered from a cloud infrastructure. The solution is based on a geo-distributed cloud platform with server nodes named digital heaters. Each heater embeds processors or GPU cards and is connected to the heat diffusion system. With this solution, homes, offices and other buildings can be heated through the distributed data center which is able to balance the requests in computation and heating.

The last tutorial of the day proposed some basics of machine learning for unsupervised algorithms. A review on the applications and the challenges faced when dealing with data sets was also given.

The second day started with the official opening of the conferences with the presentation of the program. This was followed by a keynote on scheduling methods for elastic services as for a project driven by the AlterWay company in France. The rest of the day we had two conference sessions and a further Keynote speech where Cybersecurity, with its links to geopolitical issues, was in focus.

In the first SC2 session I attended we could follow presentations about the following topics: a comparison between unikernels and containers, user plane management for 5G networks, a cost analysis of virtual machine live migration, and two papers on automated tiered storage solutions, one of which I presented myself. The second session was dedicated to work in progress papers, covering topics like contextual information searching for encrypted data in cloud storage services, smart contracts with on and off-blockchain components, cloud native 5G virtual network functions.

The evening we could enjoy a banquet on the Seine river with all the conference attendees. The cruise on the Seine brought us close to the main sightseeing attractions of beautiful Paris. A fish-based dinner was served completing a perfect environment to exchange experiences with other conference participants.

Day 3 started with an enlightening keynote speech of Prof. Cesare Pautasso from the University of Lugano, Switzerland, which described the recent trend in terms of software development. This is dictated by the current scenario where end-users have multiple devices to access their data and contents and managing their personal information. To best manage such a complex multi-device user environment Liquid software is needed, whereby software can seamlessly flow and adapt to the different devices.

After the last session with some interesting papers presenting among others solutions for multi-objective scheduling in cloud computing, confidentiality and privacy issues in the Cloud, it was time to head back home. Our participation to the SC2 conference was definitely positive and we will surely consider next year’s conference edition as possible venue to share our new research experience.

Open Cloud Day 2018

This year we had the pleasure to organize and host one of Switzerland’s most prestigious cloud events, the OpenCloudDay. On the 30th of May, we welcomed the around 80 participants at the ZHAW School of Engineering in Winterthur for a day rich with technical talks, demos and networking possibilities for Cloud Computing practitioners and experts in Switzerland.

Welcome and introduction to Open Cloud Day 2018

The program of the day started with two opening talks covering very timely topics in the field of Cloud Computing. The first talk, given by Thomas Michael Bohnert from the ICCLab, was a critical view on what many consider as the next evolutionary direction of Cloud Computing, namely Edge Computing. We got the speaker’s perspective on the motivations, the potential obstacles and open issues for this paradigm to definitely break through (or maybe not) as the next Cloud Computing frontier. The second opening talk was given by Sacha Dubois from Red Hat and focused on the potential of Ansible Tower for the automation and management of Hybrid Clouds. After a general discussion on the possibilities offered by Ansible Tower to managing both on-premise and public cloud workloads, a live demo showed how this would work in the practice.

Presentation on Ansible Tower by Red Hat

During the second part of the morning and the first part of the afternoon, two technical sessions were ran in parallel. Several topics were covered as for instance Continuous Delivery, Continuous Deployment and Continuous Integration in the Cloud, and the CNCF activities during the last year, the challenges with the adoption of Web Application Firewall for the DevOps methodology and much more. An insightful presentation was given on the current cloudware technologies and what to expect from future post-clouds systems. Practical experiences were also presented as, for instance, in setting up a Kubernetes cluster, on the use of Ansible for cloud solutions. Also a workshop about the setup of an  oVirt infrastructure for an open source Cloud Management Software was organized in the morning. For a complete program of the technical talks please visit the webpage of the OpenCloudDay.

Attendees during one of the technical presentations

The two final technical talks of the day were given by Niklaus Hofer from Stepping Stone and Jens-Christian Fischer from SWITCH. In the first of the two talks, a presentation was given on the analysis of storage performance for a Ceph cluster. More specifically, the focus was on the comparison between the new backend solution for the Luminous Ceph release, i.e. BlueStore, and the FileStore solution for storing data to disk. Open challenges and further open points of investigation were also given.
The last talk brought up a different point of view regarding all the technical solutions to run a cloud. Based on the experience of SWITCH in running an OpenStack/Ceph based cloud for the Swiss Academic community, the importance of the users’ role in using the technology was put in focus. The user’s perspective is not to be overseen as this puts additional challenges and requirements for solutions to be deployed as the experience of SWITCH clearly highlighted.

The program of the day also offered a total of seven demo presentations on the following topics: Cloud Robotics, Edge Computing, CAB, CNA, Service Tooling, ElasTest, T-Systems solutions.

One of the demos presented by the ICCLab

Storage & Data Analytics – Swiss 2018

On the 24th of May we attended the “Storage & Data Analytics – Swiss 2018” day which was organized at the Seedamm Plaza in Pfäffikon SZ.
Our interest and expertise at the ICCLab for innovative solutions in the area of Cloud Storage motivated us to join the event with the aim to exchange expertise with colleagues from both the industrial and the academic realms.

Welcome and introduction to the day

The program for the event offered a well-balanced mix of keynote speeches from top-experts in the field of storage and data analytics, presentations from specialists and companies actively working in the continuously evolving market, workshops, round-tables, and live demos on specific aspects of interest, and important moments for networking and knowledge exchange with the participants.
Besides the keynotes, the program was organized with four sessions running in parallel. The high number of persons attending the sessions and the stands proposed by the industrial partners for the event witnesses the high interest in the topics in focus. Five major areas of interest were covered: Data Management, Data Analytics, Cloud Storage, Technology and Security. You can find the complete program at the following link https://www.storage-day.ch/

Harald Seipp (IBM) presents Storage in Container-based Cloud Infrastructure

The research and development interests at the ICCLab naturally attracted our interest towards presentations in the area of Cloud Storage and Technology. The first Keynote of the day by Prof. Brinkmann from the University of Mainz, guided us through a classification of Storage with a view on the future of Storage. In the subsequent presentation by IBM, Storage in container-based Cloud Infrastructures was discussed underlying the importance of persistant storage and multi-cloud environments. Of particular interest to us was the presentation given by the company SUSE. Software Defined Storage was discussed as the de-facto Standard for storage in the Cloud, highlighting also the importance of open source based solutions when they presented their Enterprise Storage solution based on Openstack and Ceph. A further interesting analysis on Cloud Storage was later presented by the company Nutanix which introduced their full-stack solution for Storage in the Cloud.

As an icing on the cake, the day was concluded by the insightful keynote given by Moshe Rappaport, Executive Technologist at IBM Research, which guided the audience in the future shedding light on the new disruptive technologies being ahead of us. The future of Storage was also predicted as this is rapidly evolving towards high density data storage applications requiring innovative research and development solutions.

Moshe Rappaport’s insightful keynote on the future of Business and IT from an IBM research perspective

In conclusion, our participation to the “Storage & Data Analytics – Swiss 2018” was well worth the time investment. The event has clearly fulfilled the expectations as an important source of inspiration for our research activities and as an opportunity for networking with experts in the field. We are already looking forward to the next event of this kind!