Wanted: Senior Researcher / Researcher for Cloud Storage Systems

Job Description

The Service Engineering (SE, blog.zhaw.ch/icclab) group at the Zurich University of Applied Sciences (ZHAW) / Institute of Applied Information Technology (InIT) in Switzerland is seeking applications for a full-time position at its Winterthur facility.

The successful candidate will work in the InIT Cloud Computing Lab (ICCLab) and will lead the research initiative on advanced cloud storage architectures, see Cloud Storage Initiative. 

Continue reading

R.I.P.

Motorhead_logolemmy

18331937786_e2c7e9e368_z

‘Labbers at the last Motörhead concert in Switzerland

Report from the H2020 SESAME kickoff meeting

The SESAME project, funded by the European Commission under the H2020 research program, has kicked-off last week in Athens with a 3 days long meeting including representatives from all the 18 involved partners.

IMG_20150716_125854

Continue reading

SESAME – Small cEllS coordinAtion for Multi-tenancy and Edge services

SESAME targets innovations around three central elements in 5G: the placement of network intelligence and applications in the network edge through Network Functions Virtualisation (NFV) and Edge Cloud Computing; the substantial evolution of the Small Cell concept, already mainstream in 4G but expected to deliver its full potential in the challenging high dense 5G scenarios; and the consolidation of multi-tenancy in communications infrastructures, allowing several operators/service providers to engage in new sharing models of both access capacity and edge computing capabilities.

hl_architecture

SESAME proposes the Cloud-Enabled Small Cell (CESC) concept, a new multi-operator enabled Small Cell that integrates a virtualised execution platform (i.e., the Light DC) for deploying Virtual Network Functions (NVFs), supporting powerful self-x management and executing novel applications and services inside the access network infrastructure. The Light DC will feature low-power processors and hardware accelerators for time critical operations and will build a high manageable clustered edge computing infrastructure. This approach will allow new stakeholders to dynamically enter the value chain by acting as ‘host-neutral’ providers in high traffic areas where densification of multiple networks is not practical. The optimal management of a CESC deployment is a key challenge of SESAME, for which new orchestration, NFV management, virtualisation of management views per tenant, self-x features and radio access management techniques will be developed.

After designing, specifying and developing the architecture and all the involved CESC modules, SESAME will culminate with a prototype with all functionalities for proving the concept in relevant use cases. Besides, CESC will be formulated consistently and synergistically with other 5G-PPP components through coordination with the corresponding projects.

Active ICCLab Research Initiatives

Given the topics that will be developed during the project execution, the following research initiatives from ICCLab will contribute to SESAME.

Project Facts

Horizon 2020 – Call: H2020-ICT-2014-2

Topic: ICT-14-2014

Type of action: RIA

Duration30 Months

Start date: 1/7/2015

Project Title: SESAME: Small cEllS coordinAtion for Multi-tenancy and Edge services

Use pacemaker and corosync on Illumos (OmniOS) to run a HA active/passive cluster

In the Linux world, a popular approach to build highly available clusters is with a set of software tools that include pacemaker (as resource manager) and corosync (as the group communication system), plus other libraries on which they depend and some configuration utilities.

On Illumos (and in our particular case, OmniOS), the ihac project is abandoned and I couldn’t find any other platform-specific open source and mature framework for clustering. Porting pacemaker to OmniOS is an option and this post is about our experience with this task.

The objective of the post is to describe how to get an active/passive pacemaker cluster running on OmniOS and to test it with a Dummy resource agent. The use case (or test case) is not relevant, but what should be achieved in a correctly configured cluster is that, if the node of the cluster running the Dummy resource (active node) fails, then that resource should fail-over and be started on the other node (high availability).

I will assume to start from a fresh installation of OmniOS 151012 with a working network configuration (and ssh, for your comfort!). Check the general administration guide, if needed.

This is what we will cover:

  • Configuring the machines
  • Patching and compiling the tools
  • Running pacemaker and corosync from SMF
  • Running an active/passive cluster with two nodes to manage the Dummy resource

Continue reading

ICCLab summer retreat in the Black Forest

From August 20th to August 22nd, ICCLab went to a summer retreat in Grafenhausen, into the Black Forest, Germany.

The objective of the retreat was to define the current status of our research Themes and Initiatives and expose the planned work for the year to come. With so many new team members that joined us in the last months and the impressive growth rate of our lab, it was necessary to define ways to keep our work organized and efficient and respect the principles of our approach to research. Besides the hard work… well… certainly when twenty young people spend three days and *two nights* together, a great deal of fun is also to be expected!

Continue reading

Publications

Click on one of the links below to jump to the relevant section of this page, or just scroll down if you want to peruse our outputs.

Open Access Preprints

  1. Philipp Leitner, Erik Wittern, Josef Spillner, Waldemar Hummer, A mixed-method empirical study of Function-as-a-Service software development in industrial practice, preprint on PeerJ, June 2018.
  2. Andy Edmonds, Chris Woods, Ana Juan Ferrer, Juan Francisco Ribera, Thomas Michael Bohnert, “Blip: JIT and Footloose On The Edge“, May 2018.
  3. Josef Spillner, Transformation of Python Applications into Function-as-a-Service Deployments, May 2017.
  4. Josef Spillner, Snafu: Function-as-a-Service (FaaS) Runtime Design and Implementation, March 2017.
  5. Josef Spillner and Serhii Dorodko, Java Code Analysis and Transformation into AWS Lambda Functions, February 2017.
  6. Josef Spillner, Exploiting the Cloud Control Plane for Fun and Profit, January 2017.

This list may be out of date. Please refer to the arXiv search for all results first-authored by SPLab staff: SPLab preprints.

In addition to these preprints, please enjoy our ACM publications through the Author-Izer service.

ACM DL Author-ize serviceTowards Quantifiable Boundaries for Elastic Horizontal Scaling of Microservices

Manuel Ramírez López, Josef Spillner
UCC ’17 Companion Companion Proceedings of the10th International Conference on Utility and Cloud Computing, 2017

ACM DL Author-ize servicePractical Tooling for Serverless Computing

Josef Spillner
UCC ’17 Proceedings of the10th International Conference on Utility and Cloud Computing, 2017

ACM DL Author-ize serviceCloud Robotics: SLAM and Autonomous Exploration on PaaS

Giovanni Toffetti, Tobias Lötscher, Saken Kenzhegulov, Josef Spillner, Thomas Michael Bohnert
UCC ’17 Companion Companion Proceedings of the10th International Conference on Utility and Cloud Computing, 2017

 

Selected Talks and Public Appearances

  1. Josef Spillner: Distributed Service Prototyping with Cloud Functions (slides) (transcript). Tutorial @ ICDCS 2018, Vienna, Austria, July 2018.
  2. Josef Spillner: Helm Charts Quality Analysis (slides), Future Cloud Applications #5, Zurich, June 2018.
  3. Josef Spillner: Cloudware and Beyond: Engineering Methods and Tools (slides), 7th Open Cloud Day, Winterthur, May 2018.
  4. Josef Spillner: Keynote: Serverless Cyber-Physical Applications (slides), Science Meets Industry, Dresden, Germany, March 2018.
  5. Josef Spillner: Serverless Computing: FaaSter, Better, Cheaper and More Pythonic (slides), 3rd Swiss Python Summit, Rapperswil, Switzerland, February 2018.
  6. Josef Spillner: Serverless Delivery Hero — DevOps-style Tracing, Profiling and Autotuning of Cloud Functions (slides), Vienna Software Seminar, Vienna, Austria, December 2017.
  7. Josef Spillner: Practical Tooling for Serverless Computing (slides) (transcript). Tutorial @ UCC 2017, Austin, Texas, USA, December 2017.
  8. Josef Spillner: Technologies and Mindsets: Trends in Cloud-Native Applications (slides), National University of Asunción, Paraguay, August 2017.
  9. Josef Spillner: Cloud & Cyber-Physical Applications (Machines, IoT, Robots) (slides), Itaipu Technology Park, Paraguay, August 2017.
  10. Piyush Harsh: Cyclops 3.0 – Hierarchical billing made simple for future cloud applications, Future Cloud Applications #3, Zurich, July 2017.
  11. Josef Spillner: Serverless Applications: Tools, Languages, Providers and (Research) Challenges (slides), Serverless Zürich, June 2017.
  12. Seán Murphy: The edge is nigh, 6th Open Cloud Day, Bern, June 2017.
  13. Josef Spillner: Rapid prototyping of cloud applications with open source tools (a research perspective) (slides), 6th Open Cloud Day, Bern, June 2017.
  14. Josef Spillner: Function-as-a-Service: A Pythonic perspective on Serverless Computing (slides) (transcript), PyParis, June 2017.
  15. Bruno Grazioli: Deploying applications to Heterogeneous Hardware using Rancher and Docker (slides), 14th Docker Switzerland User Group Meetup, May 2017.
  16. Manuel Ramírez López: Predictable elasticity of Docker applications (slides), 14th Docker Switzerland User Group Meetup, May 2017.
  17. Josef Spillner: More on FaaS: The Swiss Army Knife of Serverless Computing (slides), Future Cloud Applications #2, April 2017.
  18. Josef Spillner: Containerising Functions using Docker and OpenShift (slides), Microservices Zürich, April 2017.
  19. Josef Spillner, Cloud Applications: Less Guessing, more Planning and Knowing (slides), University of Coimbra, May 2016.
  20. Thomas M. Bohnert, Mobile Cloud Networking: Hurtle, Cyclops, Gatekeeper (slides), 6th FOKUS FUSECO Forum, Berlin, November 5th-6th 2015.
  21. Josef Spillner, The Next Service Wave: Prototyping Cloud-Native and Stealthy Applications (slides), IBM Research Zurich, September 9th 2015.
  22. Michael Erne, Developing Heat Plugins (slides), 11th Swiss OpenStack User Group Meetup, Zurich, September 8th 2015.
  23. Andy Edmonds, Thomas Michael Bohnert & Giovanni Toffetti, “MCN: Beyond NFV”, 7th Cloud Control Workshop, Nässlingen, June 10th 2015.
  24. P Harsh & S Patanjali, CYCLOPS : Rating, Charging & Billing framework (slides), OpenStack Central & Eastern Europe Day 2015, Budapest, Hungary, June 08th 2015
  25. Vincenzo Pii, Building a cloud storage appliance on ZFS (slides), 10th OpenStack Swiss User Group Meetup, Bern, Switzerland, June 4th 2015
  26. G. Toffetti, MITOSIS: distributed autonomic management of service compositions, (slides) ACROSS COST meeting, Apr. 2015
  27. G. Toffetti, Mobile Cloud Networking (MCN): Motivation, Vision, and
    Challenges, (slides), 1st ACROSS Open Workshop on Autonomous Control for the Internet of Services, Apr. 2015
  28. M. Bloechlinger, Migrating an Application into the Cloud with Docker and CoreOS (slides), 3rd Docker Swiss User Group Meetup, Zurich, Switzerland, March 24th 2015
  29. S. Brunner, Cloud-Native Application Design (slides), 10th KuVS Expert Talks NGSDP, Frauenhofer FOKUS Berlin , Mar. 16 2015
  30. S. Murphy, Making OpenStack more Energy Efficient (slides), 9th OpenStack Swiss User Group Meetup, Zurich, Switzerland, March 5th 2015
  31. K. Benz, Monitoring Openstack – The Relationship Between Nagios and Ceilometer, Nagios World Conference, St. Paul MN USA, Oct 13-16 2014
  32. S.Patanjali, Update on CYCLOPS – Dynamic Rating, Charging & billing, OpenStack User Group (slides), October 2014, Winterthur, Switzerland
  33. A. Edmonds, “Cloud Standards & Mobile Cloud Networking”. 2nd MCN Developer Meeting, Torino, Italy. October 1st 2014.
  34. F. Dudouet, A case for CDN-as-a-Service in the Cloud – A Mobile Cloud Networking Argument (slides), ICACCI-2014, Delhi, India, Sep 2014
  35. V. Pii, Ceph vs Swift: Performance Evaluation on a Small Cluster (slides), GÉANT eduPERT Monthly Call, 24 Jul 2014
  36. P. Harsh, A Highly Available Generic Billing Architecture for Heterogeneous Mobile Cloud Services (slides), 2014 World Congress in Computer Science, July 2014, Las Vegas, USA
  37. F. Dudouet, Docker Containers Orchestration: Fig and OpenStack (slides), Docker Swiss User Group , July 2014, Bern, Switzerland
  38. Ph. Aeschlimann, Thomas Michael Bohnert, How to program the SDN – SDK4SDN in the T-NOVA project, Special Session at EUCNC, June 2014
  39. A. Edmonds, T.M. Bohnert, “End-to-End Cloudification of Mobile Telecoms“, 4th Workshop on Mobile Cloud Networking 2014, Lisbon, Portugal.
  40. P. Harsh, “Cyclops – A charging platform for OpenStack Clouds“, Swiss Open Cloud Day, June 2014, Bern, Switzerland.
  41. Ph. Aeschlimann, S. Brunner, KIARA Transport Stack Functionality (slides), ZHAW InIT Meeting, Mai 2014
  42. B.Grazioli, OpenStack an Overview (slides), ZHAW InIT, Mai 2014
  43. S.Brunner, KIARA InfiniBand Demo (slides), ICCLAB Colloqium, April 2014
  44. Ph. Aeschlimann, T.M. Bohnert, The role of SDN and NFV in Mobile Cloud Networking, First Software-Defined Networking (SDN) Concertation Workshop, January 2014, Brussels, Belgium
  45. Ph. Aeschlimann, T.M. Bohnert, QoS in OpenStack with SDN, first meeting of the SDN Group Switzerland, October 2013, Zürich, Switzerland
  46. T.M. Bohnert, Ph. Aeschlimann, Software Defined Networking in the Cloud (slides), Distributed Management Task Force, SVM2013, October 2013, Zürich, Switzerland
  47. Ph.Aeschlimann, “Introduction to Webprogramming“, course Matura2Engineer at ZHAW, October 2013.
  48. P. Harsh, J. Kennedy, A. Edmonds, T. Metsch, “Interoperability and APIs in OpenStack” (slides), EGI Technical Forum and Cloud Interoperability Plugfest, Sep 2013, Madrid, Spain
  49. T. M. Bohnert, A. Edmonds, C. Marti, “Notes on the Future Internet” (slides), 4th EuropeanFuture Internet Summit, Jun 2013, Aveiro, Portugal
  50. T. M. Bohnert, A. Edmonds, P. Aeschlimann, C. Marti, T. Zehnder, L. Graf, “Cloud Computing and the Future Internet” (slides), IEEE VTC Spring, May 2013, Dresden
  51. K. Benz, T. M. Bohnert,”OpenStack HA technologies: a framework to test HA architectures” (slides), The Conference on Future Internet Communications, May 15-16 2013, Coimbra, Portugal
  52. A. Edmonds, “OCCI & Interoperability” (slides), Future Internet Assembly, Dublin, Ireland, May 2013.
  53. A. Edmonds, T. M. Bohnert, C. Marti, “Cloud Experiences: Past, Present and Future of the ICCLab“, Academic Compute Cloud Experience Workshop, Zurich, April 2013.
  54. T. M. Bohnert, A. Edmonds, C. Marti, T. Zehnder, L. Graf, “OpenStack Technology and Ecosystem” (slides),DatacenterDynamics Converged, Apr 2013, Zurich.
  55. T. M. Bohnert, “tbd”, Second National Conference on Cloud Computing and Commerce, Apr 2013, Dublin.
  56. A. Edmonds. T.M. Bohnert, C. Marti, “Open Standards in the Cloud” (slides), Second National Conference on Cloud Computing and Commerce, Apr 2013, Dublin, Ireland.
  57. T. M. Bohnert and C. Marti, “Platform as a Service: The Future of Software Development” (slides), Innovative Software Networking Conference 2013, Feb 2013, Winterthur.
  58. L. Graf, T. Zehnder, “OpenStack Ceilometer”, Presentation at 2nd Swiss OpenStack User Group Meeting February 2013, Zurich.
  59. T. M. Bohnert, T. Taleb, “Towards Mobile Cloud Networking” (slides), Keynote at ONIT Workshop co-located with IEEE GLOBECOM 2012, Anaheim, USA, December 2012.
  60. T. Taleb, T. M. Bohnert, “Cross Roads: Cloud Computing and Mobile Networking” 1st Int workshop on Management and Security technologies for Cloud Computing 2012 (ManSec-CC 2012), IEEE Globecom 2012, Anaheim, USA, 7 Dec. 2012.
  61. T. M. Bohnert, “How to run a large-scale collaborative research project” (slides), MobileCloud Networking Project Kick-off Meeting 2012.
  62. T. Metsch, A. Edmonds, “OCCI & CAMP”, Presentation to OASIS CAMP January 2013.
  63. Ph. Aeschlimann, “SDN – OpenFlow”, Guest lecture at ZHAW for the IT-MAS students December 2012, Zurich.
  64. F. Manhart, “ICCLab”, Presentation at 1st Swiss OpenStack User Group Meeting November 2012, Zurich.
  65. Ph. Aeschlimann, “OpenStack – Quantum – Floodlight”, Presentation at 1st Swiss OpenStack User Group Meeting 2012, Zurich.
  66. A. Edmonds, F. Manhart, Thomas Michael Bohnert, Christof Marti, “From Bare Metal to Cloud”, Presentation at SwiNG SDCD 2012, Bern.
  67. Thomas M. Bohnert, “The OpenStack Cloud Computing OSS Framework” (slides), Puzzle ITC TechTalk 2012, Bern, October 2012.
  68. A. Edmonds, “Open Cloud Standards” (slides), Intel European Research and Innovation Conference, Barcelona, October 2012.
  69. T. M. Bohnert, “Dependability in the World of Clouds” (slides), Intel European Research and Innovation Conference, Barcelona, October 2012.
  70. A. Edmonds, “OCCI Honesty” Presentation, Open World Forum 2012 & Cloudcamp Paris, October 2012.
  71. T. M. Bohnert, “Software-Defined Networking in Cloud Computing Data Centers”, ITU Telecom World 2012, Dubai, October 2012.
  72. T. M. Bohnert, “The ICCLab and FI-PPP Opportunities in Phase Two and FI-WARE” (slides)“, ICTProposer’s Day, Warsaw, September 2012.
  73. A. Edmonds, P. Kasprzak , “From Bare Metal to Cloud” Presentation (video) to EGI Technical Forum 2012, Prague.
  74. T. M. Bohnert, “The FI-PPP after One Year: Lessons Learned, Challenges and Opportunities Ahead” (slides)“, 3rd European Future Internet Summit, Helsinki, June 2012.
  75. T. M. Bohnert, A. Edmonds, C. Marti, F. Manhart, “The OpenStack Cloud Computing Framework and Eco-System” (slides)“, CH-Open “Open Cloud and Public Administration”, June 2011.
  76. T. M. Bohnert,  “Vision and Status towards the Future Internet Technology Foundation” (slides)“, Net!Works General Assembly 2011, Munich, Germany, Nov 2011.
  77. A. Edmonds, “Open Cloud Computing Interface” Presentation (audio) to ISO SC38 at DMTF APTS, Boulder Colorado, 2011.

Books and Book Chapters

  1. A. Luntovskyy and J. Spillner, “Architectural Transformations in Network Services and Distributed Systems“, Springer Vieweg, 2017. ISBN 978-3-658-14840-9.
  2. “Cloud Standards”, A. Edmonds, et. al., San Murugesan (Editor), Irena Bojanova (Editor), “Encyclopedia on Cloud Computing”, Wiley, May 2015

Journals and Magazines

  1. G. Toffetti, S. Brunner, M. Blöchlinger, J. Spillner, T. M. Bohnert: Self-managing cloud-native applications: design, implementation and experience. FGCS special issue on Cloud Incident Management, volume 72, July 2017, pages 165-179, online September 2016.
  2. B. Sousa, L. Cordeiro, P. Simoes, A. Edmonds, et. al., “Towards a Fully Cloudified Mobile Network Infrastructure,” IEEE Transactions on Network and Service Management Sept. 2016.
  3. K. Benz, T. M. Bohnert, “Elastic Scaling of Cloud Application Performance Based on Western Electric Rules by Injection of Aspect-oriented Code”. In: Procedia Computer Science, Vol. 61, p.198-205, Elsevier, Nov 2015.
  4. T.M.Bohnert, Ph. Aeschlimann, “Software-defined Networking das verzögerte Paradigma”, Netzwoche Nr. 7922, October 2013
  5. A Jamakovic, T. M. Bohnert, G. Karagiannis, “Mobile Cloud Networking: Mobile Network, Compute, and Storage as One Service On-Demand”, Future Internet Assembly 2013: 356-358, May 2013
  6. Edmonds, A., Metsch, T., Papaspyrou, A., and Richardson, A., “Toward an Open Cloud Standard.” IEEE Internet Computing 16, 4 (July 2012), 15–25.
  7. A. Cimmino, P. Harsh, T. Pecorella, R. Fantacci, F. Granelli, Talha Faizur Rahman, C. Sacchi, C. Carlini – Transactions on emerging telecommunications technologies – Special Issue Article – The role of Small Cell Technology in Future Smart City
  8. A. Edmonds, T. Metsch, and A. Papaspyrou, “Open Cloud Computing Interface in Data Management-related Setups,” Springer Grid and Cloud Database Management, pp. 1–27, Jul. 2011.
  9. A. Edmonds, T. Metsch, E. Luster, “An Open, Interoperable Cloud“, infoq.com, 2011
  10. M. Nolan, J. Kennedy, A. Edmonds, J. Butler, J. McCarthy, M. Stopar, P. Hadalin, Damjan Murn, “SLA-enabled Enterprise IT”, vol. 6994, no. 34. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 319–320.
  11. J. Kennedy, A. Edmonds, V. Bayon, P. Cheevers, K. Lu, M. Stopar, D. Murn, ”SLA-Enabled Infrastructure Management”, Service Level Agreements for Cloud Computing, no. 16. New York, NY: Springer New York, 2011, pp. 271–287.
  12. J. Happe, W. Theilmann, A. Edmonds, K. Kearney, “A Reference Architecture for Multi-level SLA Management”, Service Level Agreements for Cloud Computing, no. 2. New York, NY: Springer New York, 2011, pp. 13–26.
  13. A. Edmonds, T. Metsch, A. Papaspyrou, A. Richardson, “Open Cloud Computing Interface: Open Community Leading Cloud Standards”, ERCIM News No. 83, Special Theme: Cloud Computing, 2010.
  14. W. Theilmann, J. Happe, C. Kotsokalis, and A. Edmonds, “A Reference Architecture for Multi-Level SLA Management,” Journal of Internet Engineering, 2010.
  15. Y. L. Sun, R. Perrott, T. J. Harmer, C. Cunningham, P. Wright, J. Kennedy, A. Edmonds, V. Bayon, J. Maza, G. Berginc, and P. Hadalin, “Grids and Service-Oriented Architectures for Service Level Agreements,” no. 4, P. Wieder, R. Yahyapour, and W. Ziegler, Eds. Boston, MA: Springer US, 2010, pp. 35–44.
  16. T. M. Bohnert, P. Robinson, M. Devetsikiotis, B. Callaway, G. Michailidis, D. Trossen, Special Issue in Journal of Internet Engineering, “Service-Oriented Infrastructure”, Spring 2010

Conference Publications

  1. To appear: J. Spillner, Y. Bogado, W. Benítez, F. López Pires, “Co-Transformation to Cloud-Native Applications — Development Experiences and Experimental Evaluation”, 8th International Conference on Cloud Computing and Services Science (CLOSER), Funchal, Madeira – Portugal, May 2018. (author version; digitalcollection; to appear in SCITEPRESS)
  2. Giovanni Toffetti, Tobias Lötscher, Saken Kenzhegulov, Josef Spillner, and Thomas Michael Bohnert. 2017. Cloud Robotics: SLAM and Autonomous Exploration on PaaS. In Companion Proceedings of the10th International Conference on Utility and Cloud Computing (UCC ’17 Companion). ACM, New York, NY, USA, 65-70. DOI: https://doi.org/10.1145/3147234.3148100
  3. M. Ramírez López, J. Spillner, “Towards Quantifiable Boundaries for Elastic Horizontal Scaling of Microservices”, 6th International Workshop on Clouds and (eScience) Applications Management (CloudAM) / 10th International Conference on Utility and Cloud Computing Companion (UCC), Austin, Texas, USA, December 2017. (author version; slides; digitalcollection; ACM DL)
  4. J. Spillner, C. Mateos, D. A. Monge, “FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC”, 4th Latin American Conference on High Performance Computing (CARLA), Buenos Aires, Argentina, September 2017. (author version; slides; Springer CCIS)
  5. J. Spillner, G. Toffetti, M. Ramírez López, “Cloud-Native Databases: An Application Perspective”, 3rd International Workshop on Cloud Adoption and Migration (CloudWays) @ ESOCC, Oslo, Norway, September 2017. (author version; slides; to appear in Springer CCIS)
  6. A. P. Vumo, J. Spillner, S. Köpsell, “Analysis of Mozambican Websites: How do they protect their users?”, 16th International Information Security South Africa Conference (ISSA), Johannesburg, South Africa, August 2017.
  7. M. Skoviera, P. Harsh, O. Serhiienko, M. Perez Belmonte, T. B. Bohnert, “Monetization of Infrastructures and Services”, European Conference on Networks and Communications (EuCNC), Oulu, Finland, June 2017.
  8. A. Edmonds, G. Carella, F. Z. Yousaf, C. Gonçalves, T. M. Bohnert, T. Metsch, P. Bellavista, L. Foschini, “An OCCI-compliant Framework for Fine-grained Resource-aware Management in Mobile Cloud Networking,” 20th IEEE Symposium on Computers and Communication (ISCC) 2016.
  9. E. Cau, M. Corici, P. Bellavista, L. Foschini, G. Carella, A. Edmonds, T. M. Bohnert, “Efficient Exploitation of Mobile Edge Computing for Virtualized 5G in EPC Architectures,” 4th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud) 2016
  10. G. Carella, A. Edmonds, F. Dudouet, M. Corici, B. Sousa, Z. Yousaf, “Mobile cloud networking: From cloud, through NFV and beyond,” 2015 IEEE Conference on Network Function Virtualization and Software-Defined Networks (NFV-SDN).
  11. J. Spillner, M. Beck, A. Schill, T. M. Bohnert, “Stealth Databases: Ensuring User-Controlled Queries in Untrusted Cloud Environments”, 8th IEEE/ACM International Conference on Utility and Cloud Computing (UCC), Limassol, Cyprus, December 2015. (slides; author version; IEEExplore/ACM DL)
  12. S. Brunner, M. Blöchlinger, G. Toffetti, J. Spillner, T. M. Bohnert, “Experimental Evaluation of the Cloud-Native Application Design”, 4th International Workshop on Clouds and (eScience) Application Management (CloudAM), Limassol, Cyprus, December 2015. (slides; author version; IEEExplore/ACM DL)
  13. K. Benz, T. M. Bohnert, “Elastic Scaling of Cloud Application Performance Based on Western Electric Rules by Injection of Aspect-oriented Code”, 5th Conference on Complex Adaptive Systems, San Jose CA, USA, November 2015.
  14. S. Patanjali, B. Truninger, P. Harsh, T. Bohnert, “Cyclops: Rating, Charging & Billing framework for cloud”, The 13th international conference on Telecommunications, Graz, Austria, 2015
  15. P. Harsh, and T. Bohnert, “DISCO: Unified Provisioning of Distributed Computing Platforms in the Cloud”, in proceedings of 21st International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, USA, July 2015
  16. B. Meszaros, P. Harsh, and T. Bohnert, “Lightning Sparks all around: A comprehensive analysis of popular distributed computing frameworks”, International Conference on Advances in Big Data Analytics (ABDA’15), Las Vegas, USA, July 2015
  17. Florian Dudouet, Andy Edmonds and Michael Erne, “Reliable Cloud-Applications: an Implementation through Service Orchestration”, International Workshop on Automated Incident Management in Cloud (AIMC’15), Bordeaux, France, April 2015
  18. Giovanni Toffetti Carughi, Sandro Brunner, Martin Blochinger, Florian Dudouet and Andrew Edmonds, “An architecture for self-managing microservices”, International Workshop on Automated Incident Management in Cloud (AIMC’15), Bordeaux, France, April 2015
  19. S. Murphy, V. Cima, T.M. Bohnert, B. Grazioli, “Adding Energy Efficiency to OpenStack”, The 4th IFIP Conference on Sustainable Internet and ICT for Sustainability, Madrid, Spain, April 2015
  20. G. Landi, P.M. Neves,  A. Edmonds,  T. Metsch,  J. Mueller,  P. S. Crosta, “SLA management and service composition of virtualized applications in mobile networking environments.”, IEEE Network Operations and Management Symposium (NOMS), 2014
  21. V.I. Munteanu, A. Edmonds, T.M. Bohnert, T-F. Fortis, “Cloud Incident Management, Challenges, Research Direction and Architectural Approach”, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, London, UK, Dec 2014
  22. F. Dudouet, P. Harsh, S. Ruiz, A. Gomes, T.M. Bohnert, “A case for CDN-as-a-Service in the Cloud – A Mobile Cloud Networking Argument”, ICACCI-2014, Delhi, India, Sep 2014
  23. P. Harsh, K. Benz, I. Trajkovska, A. Edmonds, P. Comi, T. Bohnert, “A highly available generic billing architecture for heterogenous mobile cloud services”, The 2014 World Congress in Computer Science, Computer Engineering, and Applied Computing, Las Vegas, USA.
  24. K. Benz, T. M. Bohnert, “Impact of Pacemaker failover configuration on mean time to recovery for small cloud clusters”, 2014 7th IEEE International Conference on Cloud Computing, Alaska, USA.
  25. S. Ibrahim, D. Moise, H. Chihoub, A. Carpen, G. Antoniu, L. Bouge, “Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop”, The ACM Symposium on Principles of Distributed Computing (PODC’14), Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, Paris, France.
  26. I. Trajkovska, Ph. Aeschlimann, C. Marti, T.M. Bohnert, J. Salvachua “SDN enabled QoS Provision for Online Streaming Services in Residential ISP Networks“, 2014 IEEE International Conference on Consumer Electronics – Taiwan
  27. I. Anghel, M. Bertoncini, T. Cioara, M. Cupelli, V. Georgiadou, P. Jahangiri, A. Monti, S. Murphy, A. Schoofs, and T. Velivassaki. “GEYSER: Enabling Green Data Centres in Smart Cities”. To appear in proceedings of 3rd International Workshop on Energy-Efficient Data Centres, June 2014
  28. P. Kunszt, S. Maffioletti, D. Flanders, M. Eurich, T.M. Bohnert, A. Edmonds, H. Stockinger, S. Haug, A. Jamakovic-Kapic, P. Flury, S. Leinen and E. Schiller, “Towards a Swiss National Research Infrastructure”, FedICI 2013, Aachen, Germany.
  29. A. Edmonds, T. Metsch, D. Petcu, E. Elmroth, J. Marshall, P. Ganchosov, “FluidCloud: An Open Framework for Relocation of Cloud Services“, USENIX HotCloud ’13, San Jose, CA, US.
  30. K. Benz, T. M. Bohnert, “Dependability Modeling Framework: A test procedure for High Availability in Cloud Operating Systems”, 2013 IEEE 78th Vehicular Technology Conference (VTC Fall), Las Vegas, USA.
  31. D. Stroppa, A. Edmonds, T. M. Bohnert, “Reliability and Perfomance for OpenStack through SmartOS”, 8th Kommunikation und Verteilte Systeme Fachgespräch (Communication and Distributed Systems Workshop), Königswinter, April 2013 (slides)
  32. E. Escalona, S. Peng, R. Nejabati, D. Simeonidou, J. A. Garcia-Espin, J. Ferrer, S. Figuerola, G. Landi, N. Ciulli, J. Jimenez, B. Belter, Y. Demchenko, C. Laat, X. Chen, A. Yukan, S. Soudan, P. Vicat-Blanc, J. Buysse, M. Leenheer, C. Develder, A. Tzanakaki, P. Robinson, M. Brogle, T. M. Bohnert, “GEYSERS: A Novel Architecture for Virtualization and Co-Provisioning of Dynamic Optical Networks and IT Services”, Future Networks and Mobile Summit 2011, Warsaw, Poland, Jun 2011
  33. T. M. Bohnert, N. Ciulli, S. Figuerola, P. Vicat-Blanc Primet, D. Simeonidou, “Optical Networking for Cloud Computing”, IEEE OFC 2011, Los Angeles, USA, Mar 2011

Standard Specifications

  1. T. Metsch, A. Edmonds, R. Nyrén, and A. Papaspyrou, “Open Cloud Computing Interface – Core,” ogf.org, 2011.
  2. A. Edmonds and T. Metsch, “Open Cloud Computing Interface – Infrastructure,” ogf.org, 2011.
  3. T. Metsch and A. Edmonds, “Open Cloud Computing Interface – RESTful HTTP Rendering,” ogf.org, 2011.

 Theses, Student Papers

  1. S. Patanjali, “Cyclops: Dynamic Rating, Charging & Billing for Cloud”, MSc, ICCLab, Dec 2015
  2. K. Benz, “VM Reliability Tester”, MSc, ICCLab, Jun 2015
  3. K. Benz, “Nagios OpenStack Data Collector & Integration Tool”, MSc, ICCLab, Mar 2015
  4. B. Grazioli, “Data Analysis of Energy Consumption in an Experimental OpenStack System”, BSc, ICCLab, Feb 2015
  5. K. Benz, “An ITIL-compliant Reliability Architecture and Tool Set for OpenStack Cloud Deployments”, Nov 2014
  6. K. Benz, “Evaluation of Maturity, Risks and Opportunities in the Implementation of a High Availability Architecture for an OpenStack Cloud Federation Node”, Sep 2014
  7. S. Brunner, “Konzept zur Migration von Applikationen in die Cloud”, BSc, ICCLab, Jun 2014
  8. J. Borutta, U. Wermelskirchen, “Einsatz von kommerziellen Mobilgeräten und Wireless-Kommunikation im militärischen Umfeld”, July 2014
  9. R. Bäriswyl, “Distributed Crawling”, BSc, Nov 2013
  10. P. Aeschlimann, “OpenFlow for KIARA (FI-WARE), the future Middleware”, BSc, ICCLab, Aug 2013
  11. L. Graf, T. Zehnder, „Monitoring für Cloud Computing Infrastruktur“, BSc, ICCLab, Jun 2013
  12. L. Graf, T. Zehnder, „Monitoring für Cloud Computing Infrastruktur”, PA, ICCLab, Jan 2013
  13. F. Manhart, “Cloud Computing”, MSE Tech Scouting, 2012.

Lab Reports

  1. Service Prototyping Lab Report – 2016 (Y1). August 31, 2016. (pdf)
  2. Service Prototyping Lab Report – 2017 (Y2). September 7, 2017. (pdf)

Short video introduction to COSBench

COSBench is a tool developed by Intel for benchmarking cloud object storage services.

Here’s a brief video showing some functions of the web interface.

For more details, please refer to the COSBench user guide.

COSBench GitHub page

Evaluating the performance of Ceph and Swift for object storage on small clusters

NOTE TO THE READER – 16 Jun 2014: after this post was published, some very insightful comments have been issued that made me consider the necessity of adding this note to warn the readers and remark what has been compared in this study. Ceph and Swift have been deployed and compared as object storage systems and not necessarily as ReSTful services (typically accessed over HTTP). Consequently, we have used librados to access Ceph and the ReST HTTP APIs to access Swift. This is a difference that marks a distinction between the two systems after they have been deployed on a private storage cluster. However, accessing the two systems with this two different interfaces entails a bigger overhead (thus less performance) for Swift than for Ceph. For this study, focused on small clusters, this is part of the differences between the two services under test. On production deployments however, it may make more sense to use the well established Swift HTTP APIs also for Ceph (available with the additional radosgw component) and the results here shown should not (read: must not) be applied for that case.

 

Introduction

Swift and Ceph are two popular cloud storage systems that provide object-based access to data (Ceph also supports file and block storage).

The Swift project ships an open source codebase entirely written in Python and its development is currently led by SwiftStack. Feature-wise, Swift is an object storage service that supports ReSTful HTTP APIs, data replication across different logical zones, arbitrarily large objects, scaling-out without downtime, etc.

Ceph is mostly written in C++, but it also includes some parts in Perl, C and Python. The company leading its development (InkTank) was acquired by RedHat in April 2014. Ceph presents a more rich set of features, most notably by supporting block- and file-storage besides object-storage. The supported file systems for object storage devices (OSDs) are xfs, btrfs and ext4, with xfs being recommended for production deployments. Data placement in Ceph is determined through the CRUSH algorithm, which computes storage locations for target data and enables clients to communicate directly with OSDs. Ceph allows for the fine tuning of many configuration parameters, including the definition of a CRUSH map (a logical description of the physical infrastructure of the data center), the specification of replication requirements on a per-pool level (pools are containers of objects), the definition of rulesets for data placement (e.g., to avoid placement of replicas within the same host), etc.

Objectives of this study

The objective of this experiment is to compare two different storage systems for the cloud (both Swift and Ceph can be used with OpenStack) with an object-based interface, with the intention of evaluating the performance of Ceph with respect to a system – Swift, that is considered to be very mature and counts already many production deployments. Important institutions or companies use Swift for their storage or as a basis on which their storage services are built (wikimediaDisneySpilgamesHP …).

The storage cluster used for the comparison is a small one that could be seen in private deployments for limited storage needs or on experimental environments. This study aims at evaluating the differences that may arise when using the two services on such a scaled down cluster and which limiting factors should be taken into account when using distributed storage on small deployments.

Hardware configuration

Network infrastructure

A storage cluster has been configured using three servers of the ICCLab data center and its networking infrastructure.

All the nodes have the same hardware configuration and are connected over a dedicated storage VLAN through Gigabit Ethernet links (1000Base-T cabling).

   Node1 (.2)             Node2 (.3)              Node3 (.4)
     |                       |                       |
     |                       |                       |
     |                       |                       |
<=====================10.0.5.x/24=========================>

Servers

  • Lynx CALLEO Application Server 1240
  • 2x Intel® Xeon® E5620 (4 core)
  • 8x 8 GB DDR3 SDRAM, 1333 MHz, registered, ECC
  • 4x 1 TB Enterprise SATA-3 Hard Disk, 7200 RPM, 6 Gb/s (Seagate® ST1000NM0011)
  • 2x Gigabit Ethernet network interfaces

Performance of local disks

The performance of the local disks have been measured with hdparm (read) and dd (write):

Read: ca. 140 MB/s

$ sudo hdparm -t --direct /dev/sdb1

/dev/sdb1:
 Timing O_DIRECT disk reads: 430 MB in  3.00 seconds = 143.17 MB/sec

Write: ca. 120 MB/s (with 1G Bytes block)

$ dd if=/dev/zero of=anof bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 8.75321 s, 123 MB/s

All the other devices in the cluster have produced similar results (all the hard drives are the same model).

Network performance

To measure the bandwidth between a Ceph storage node and the node running the benchmarks we have used iperf:

$ iperf -c ceph-osd0
------------------------------------------------------------
Client connecting to ceph-osd0, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.5.2 port 41012 connected with 10.0.5.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.10 GBytes   942 Mbits/sec

For quick reference: 942 Mbits/sec = 117.75 MB/s

Software configuration

  • Operating system: Ubuntu 14.04 Server Edition with Kernel 3.13.0-24-generic
  • ceph version 0.79 (4c2d73a5095f527c3a2168deb5fa54b3c8991a6e)
  • swift 1.13.1

To avoid biases in the measurement of the performance of the two systems, Ceph was completely shutdown when Swift was running and vice-versa.

Cloud storage configuration

For the production of meaningful statistics, equivalent configurations (where applicable) have been used for both Ceph and Swift.

In both benchmarks, Node1 has been used as a “control” node (monitor node in Ceph and Proxy node in Swift) and it has not been used for data storage purposes.

Node2 and Node3, each one containing 4 Hard Drives, have been used as the storage devices for objects and their replicas. For consistency of performance measurements, both systems have been configured with a replication level of 3 (three replicas for each object).

Ceph

With respect to the reference physical infrastructure, a Ceph cluster has been deployed using the following logical mappings:

  • Node1 ==> Ceph MON0 (monitor node)

  • Node2 ==> Ceph OSD0 (storage node 0)

  • Node3 ==> Ceph OSD1 (storage node 1)

No Metadata Server nodes (MDS) have been used as we have configured Ceph for object-storage only (MDS is required for the Ceph Filesystem).

For each of the two storage nodes, two 1TB HDDs have been used as OSDs and a dedicated 1TB HDD has been used for journaling (the remaining HDD was reserved for the OS and software installation). All OSDs have been formatted with an xfs filesystem and gpt label.

The following diagram depicts the storage configuration used for Ceph, with a total of one monitor node and four OSDs:

[Node1 - MON]         [Node2 - OSD]           [Node2 - OSD]
 =========================================================
[HDD1: OS]            [HDD1: OS]              [HDD1: OS]
[HDD2: not used]      [HDD2: osd.0 - xfs]     [HDD2: osd.2 - xfs]
[HDD3: not used]      [HDD3: osd.1 - xfs]     [HDD3: osd.3 - xfs]
[HDD4: not used]      [HDD4: journal]         [HDD4: journal]

Each data pool was created with a replica size of 3 and min_size of 2 (each object is replicated three times, but I/O operations on a certain object are allowed after only two replicas are in place), using custom rulesets that allow the replication of objects within the same host (the default would be to replicate objects on separate hosts).

The number of placement groups (PGs) for each pool has been determined with the approximate heuristic formula reported in the official documentation:

    (OSDs * 100)
x = ------------;    PGs = x rounded up to the nearest power of two
      Replicas

In this case, the result of 133 has been approximated to get 128 PGs for this deployment.

Swift

The Swift cluster has been configured with one proxy node and two storage nodes.

To maintain consistency with the Ceph configuration, only two disks of each storage node have been used for data and formatted with an xfs filesystem. Similarly, to apply the same replication policy as in Ceph, each disk has been assigned to a different zone and the replicas parameter has been set to 3.

The following diagram depicts the storage configuration used for Swift, with a total of one proxy node and four storage devices:

[Node1 - Proxy]       [Node2 - Storage]       [Node2 - Storage]
 =============================================================
[HDD1: OS]            [HDD1: OS]              [HDD1: OS]
[HDD2: not used]      [HDD2: dev1 - xfs]      [HDD2: dev3 - xfs]
[HDD3: not used]      [HDD3: dev2 - xfs]      [HDD3: dev4 - xfs]
[HDD4: not used]      [HDD4: not used]        [HDD4: not used]

All devices have the same weight and belong to the same region.

Given the amount of available disks, and a roughly estimated number of 100 partitions per disk, the ring partition power has been set to 9, accounting for a total of 512 partitions.

The number of workers for the different Swift server components has been configured as follows (configuration guidelines):

  • Proxy server: 8
  • Account: 4
  • Container server: 4
  • Object server: 4 (with 4 threads per disk)

The auditor, replicator, reaper and updater settings have been left to their default values.

Benchmark tool: COSBench

COSBench is a benchmarking tool for cloud object storage developed by Intel and released to the public on GitHub. Among the other supported object storage systems, COSBench supports both Swift and Ceph, the latter using librados for Java.

COSBench allows the definition of workloads with a convenient XML syntax, it ships a web interface to submit new workloads and to monitor the process of ongoing ones and it automatically stores workloads statistics as CSV files.

The COSBench version used to produce these results was 0.4.0.b2.

Workloads

One set of workloads has been defined and used for both Ceph and Swift. Some adaptations were required for the different authentication mechanisms and for letting COSBench use pre-provisioned pools (the equivalent of Swift containers) for Ceph. This change was necessary to avoid that COSBench created new pools which would have used undesired default settings (replication levels and crush rulesets not matching the needed ones and inadequate number of placement groups).

For this benchmarks, 12 distinct workloads have been used. Each workload has been populated with 18 workstages (excluding the special stages to prepare and cleanup each workload).

Each workload has been defined to use a definite number of containers (pools in Ceph) and objects size. For example, in workload 1, every workstage was running with 1 container and objects of 4 kB.

Different workstages applied different distributions of read/write/delete operations and different number of workers for each read/write/delete phase.

Each workstage had a rampup time of 2 minutes and a total running time of 10 minutes (for a total of 12 minutes per stage). The read/write/delete operations for each stage were uniformly distributed across the available containers for that workload and a fixed number of 1000 objects per container.

Different configuration for each parameter has been used as shown in the following table:

Containers

Objects size

R/W/D Distribution (%)

Workers

1

4 kB

80/15/5

1

20

128 kB

100/0/0

16

 

512 kB

0/100/0

64

 

1024 kB

 

128

 

5 MB

 

256

 

10 MB

 

512

The total number of executed workstages is given by all the possible combinations of the values of each column with all the others, resulting in 216 stages, for a total running time of 43.2 hours for each set of workloads for each storage system.

In particular, the number of containers and the objects size were defined on a per-workload basis, while workstages within the same workload used different read/write/delete distributions and different number of parallel workers.

Naming conventions

Workload names are of the form <x>cont_<y><s>b where

  • <x> is the number of containers

  • <y> is the size of the containers expressed either in kB or MB

  • <s> is the unit for the containers size. For this session, <s> can either be ‘k’ (for kilo) or ‘m’ (for mega)

Sizes are always expressed in bytes, even when a lowercase ‘b’ is used such as in ‘kb’ or ‘mb’ (kb = 1000 bytes, mb = 1000 * 1000 bytes).

Metrics

COSBench collects statistics about the system under test with a configurable period that for these benchmarks was set to the default value of 5 seconds.

Statistics are then written to CSV files and also viewable on the web interface (for running workloads, statistics are also displayed at real-time).

COSBench produces the following statistics for each workload:

  • One file for each workstage with the sampled values for each metric, for each sampling period
  • One file with the average values for each metric for each workstage (1 value per metric per workstage)
  • One file with the response-time analysis for each workstage (number of samples within a certain response-time window for each workstage)

The metrics for which COSBench collects statistics are: Op-Count (number of operations), Byte-Count (number of bytes), Response-Time (average response time for each successful request), Processing-Time (average processing time for each successful request), Throughput (operations per seconds), Bandwidth (bytes per seconds), Success-Ratio (ratio of successful operations).

Graphs generation

Graphs have been generated from the CSV data output by COSBench using a dedicated tool that has been developed for this purpose: cosbench-plot.

The project is available on GitHub: https://github.com/icclab/cosbench-plot

Performance results

Read operations

The following graph shows the read throughput for 100% read workstages for every workload for both Ceph and Swift. The values displayed in the chart are the maximums of the averages across the considered workstages for any particular workload. The text annotation next to each value denotes the workstage in which the average for that metric had the highest value. In this case, the annotation represents the number of threads for that workstage. For example, we can observe that for the “1cont_128kb” workload (1 container, 128 kB objects), the Ceph workstage with 100% read operations that scored the highest average Throughput (across the workload), was the one with 512 threads. Swift achieved its best result in the same scenario with the workstage with 64 threads.

001 - read-tpt-workloads

From this graph we can observe that Ceph manages a higher number of read operations than Swift when the object size is small, with an even more prominent difference when the available containers are 20 instead of 1 (20000 and 1000 totally available objects respectively). When the object size grows to 1024 kB and above, the amount of read operations that the two systems can perform is approximately the same, but each system reaches its highest performance with a different number of threads. The general trend that can be observed is that Ceph can reach a better performance with more parallel workers than Swift, denoting a better behavior to handle an increasing number of parallel requests.

The following graph provides an indication of the difference between the throughput performance of both storage systems on the 100% read workstages of the “1cont_4kb” workload:

 002 - read-tpt-1cont-4kb

On Ceph, both workstages with 64 and 16 workers score well, while 512 workers obtain a slightly more degraded performance. This is due to the fact that threads are accessing (with a random uniform distribution) a single pool of 1000 objects of very small size, thus every read operation requires a very short processing time, compared to which, the overhead for the synchronization of the threads results to be not negligible. An interesting statistic comes from the workstage with a single worker, in which Ceph performs sensibly better than Swift. This is probably a consequence of the Ceph CRUSH algorithm for objects lookup with librados access being more efficient (in this case) than the HTTP-based access to the ring mappings in Swift. Again, this difference can be appreciated because the processing time for reading small 4 kB objects is very small, emphasizing the impact of the lookup operation on the overall performance.

In case of slightly bigger objects (128 kB), we can observe that the throughput performance of Ceph is almost equivalent for any number of workers greater than 1, while Swift has a performance increase when going from 16 to 64 threads, but experiences a degradation when the number of threads is bigger than 128.

003 - read-tpt-1cont-128kb

The following response time diagrams provide a more clear picture of the saturation level of the system for the “1cont_128kb” workload. With Ceph (first diagram), only a small set of samples experiences a delay of about 2 seconds and almost all samples are below this threshold. With Swift (second diagram), the response time is very low in all cases until about the 95th percentile. After this point of the Cumulative Distribution Function (CDF), some samples on the 128, 256 and 512 threads workstages experience long latency, with values that exceed 10 seconds.

004 - read-ceph-rt-1cont-128kb 005 - read-swift-rt-1cont-128kb

Starting from an object size of 512 kB, the throughput performance of both systems decreases to reach a rate of ca. 23 operations per second with objects of 5 MB (Ceph) and ca. 11 operations per second with objects of 10 MB (Ceph). This is an expected behavior that is due to two reasons:

  1. An increasing object size necessarily entails a decreasing number of operations per second, given a fixed amount of time and the same overall available bandwidth
  2. Limiting physical factors (bottlenecks) come into play and prevent the throughput to increase when the system has reached its saturation point

For what concerns these benchmarks, the bandwidth graph (below) shows that Ceph already reaches a plateau in the read bandwidth when the objects size is 128 kB for both 1 and 20 containers. Swift shows a similar behavior with 512 kB objects and 1024 kB objects for 1 and 20 containers respectively, with a more efficient behavior than Ceph when the size of the objects is 10 MB (Swift bandwidth is slightly higher in these cases).

006 -read-bdw-workloads

The upper bound for the bandwidth is 120 MB/s (1.2e08 B/s in the graph). This limit is below the reading bandwidth of a single disk in the system (ca. 140 MB/s), but matches the available network bandwidth of 1gb/s (125 MB/s), showing that the performance bottleneck in this small test cluster is the networking infrastructure.

When considering how the two systems scale at the increase of concurrent requests, we observe that Ceph performance doesn’t degrade as significantly as in Swift with more parallel workers accessing the storage service.

As an example of this behavior (in condition of saturation), we can consider the read throughput for 1024 kB objects over 20 containers (20000 available objects on which to uniformly distribute the workload), shown in the graph below:

007 -read-tpt-20cont-1024kb

Swift obtains the best performance with 64 threads and its throughput decreases of ca. 20% when the threads are 512, denoting an already over-saturated system that cannot accommodate more requests. On the other hand, the performance for Ceph is equivalent regardless the number of threads and, even if the system is already saturated with 256 threads, when 512 workers become active, Ceph still manages to avoid starvation, as we can see in the response time diagram below.

008 -read-ceph-swift-rt-20cont-1024kb

With 512 threads, Ceph has a delay of ca. 7-8 seconds at the 90th percentile, while the behavior of Swift seems more greedy, with less samples that experience a very small delay (ca. 75th percentile) and a significant portion of the remaining ones that can exceed a delay of 15 seconds (with 512 threads).

Write operations

For each workload, the following diagram shows the 100% write workstage that scored the highest average throughput for each of the two storage systems (note that this diagram doesn’t take into account how Ceph and Swift manage the creation of object replicas and data consistency, a factor which will be considered later).

009 -write-tpt-workloads

Similarly to what we have seen for the 100% read workstages, Ceph performs better when the size of the objects is small (4 kB), with a sensible difference in the “1cont_4kb” workload where Ceph almost doubles the performance of Swift (with 512 and 256 threads respectively). Both for 1 and 20 containers, Swift has a better performance with objects size of 128 kB, 512 kB and 1024 kB with a decreasing gap that tends to zero when the size of the objects is of 5 MB and 10 MB (Ceph has a slightly better throughput for these last two cases).

An interesting statistic for 100% write workstages is when Ceph and Swift are both accessed with a single thread as, in contrast with the read case, Swift performs better than Ceph. An example of this behavior can be observed in the graph below for one container and an object size of 128 kB.

010 -write-tpt-1cont-128kb

For a complete analysis of the write performance, it is also interesting to consider what is the impact of writing replicas of objects.

With the configuration used for the data pools of this experiment, Ceph creates three replicas of each object (so each object is copied two times) and allows I/O operations after at least two replicas have been written. Ceph is a strongly consistent system, so it will also wait until the minimum number of replicas has been written to the disks before returning a success code to the client. In Ceph, writing the transactions log to the Journal also requires some extra overhead (unless copy-on-write is used, which is not the case for xfs filesystems). Swift uses a similar approach as it streams all the 3 copies of an object to the disks at the same time and returns success when at least two of them have been correctly written.

As an example to demonstrate the difference in performance caused by writing the replicas (in Ceph), we have setup a dedicated Ceph benchmark which is identical to the “1cont_128kb” 100% write workload with one single thread, except for the fact that writing operations are performed on a special pool with a replication size of 1 (so no copies of the object other than the original instance are made). We can see from the results below that the theoretical difference that we would have expected (a 2x factor – when replicas are written to disks in parallel, starting from the 2nd copy) is precisely what we get.

011 -write-tpt-1cont-128kb-replicas 012 -ceph-replicas-rt-1cont-128kb

In the case of 100% Write operations, the bandwidth chart is reported below.

013 -write-bdw-workloads

The performance of Swift reaches a maximum value of almost 40 MB/s when the objects size is bigger than 512 kB and 1024 kB respectively for 1 and 20 containers. On the contrary, Ceph performance continues to increase when the size of the object grows. Interestingly, for the same object size, Swift achieves better performance with a single container rather than with 20.

The cluster network doesn’t seem to represent a bottleneck in case of write operations.

The saturation of the system can be evaluated with the analysis of the response times. In the graph below, we can observe that for the “20cont_512kb” workload, when the number of concurrent threads is 512, the delay is ca. 10 seconds at the 90th percentile for both Ceph and Swift.

014 -write-ceph-swift-rt-20cont-512kb

The value of delay increases to above 50 seconds for the workload with 10 MB sized objects, 20 containers and 512 concurrent workers (graph is omitted on purpose), resulting in a completely saturated system for both storage systems (with Ceph still behaving more responsively than Swift).

Mixed operations

For mixed read, write and delete traffic characterizations, every workstage has been repeated with a distribution of 80% read, 15% write and 5% delete operations.

Each different type of operation was applied the whole set of available objects and pools/containers (so every object was available at any given time to be read, written or deleted).

The graphs for the read, write and delete throughput are reported below:

015 -read-mixed-tpt-workloads 016 -write-mixed-tpt-workloads

017 -delete-mixed-tpt-workloads

The graphs all share a similar “form” as the ratio between read/write/delete has been imposed with the workstages configuration (or better, with the configuration of works inside the workstages).

With respect to the pure write or read cases, there is an expected performance drop due to division of the workloads between different operations in the same timeslot. The proportions are obviously not maintained with respect to the 100% cases, meaning that the result that is obtained by e.g. the read operation in this mixed configuration is not the 80% of the read throughput of the pure read case. The reason for this is to be found in the interference that the other two works are causing to the execution of a single work. For reading objects, the fact that the system is busy at serving also write and delete requests, causes an increased usage of resources than if only other read requests were served. Consequently, the performance of read with respect to the 100% Read case, drops below 80%. For analogous reasons, write operations can be achieved more easily in a system that is busy only at its 20% in performing disk writes (for write works and for what may be needed by delete works).

As observed in the previous cases, Ceph has a noticeable higher throughput than Swift when the size of the objects is small, so when the number of operations is higher for both systems. The gap between Ceph and Swift decreases up to be almost nonexistent when the objects have a size of 5 MB or 10 MB.

To analyze the saturation of the system, we can refer to the response time diagrams for read operation in the “20cont_5mb” workloads:

018 -ceph-rt-20cont-5mb 019 -swift-rt-20cont-5mb

Consistently with the pure read/write cases, the performance of Ceph is more conservative than the one of Swift. With the latter, workers reach a point of starvation starting with a load of 256 threads at ca. the 75th percentile, while with Ceph, even if the system is completely saturated with 512 workers (with a stable response time of 20 seconds), all requests are served with fairness and reasonable delay up to 64 concurrent workers.

If we consider the over-saturated case of 512 threads, we can argue that Swift behavior is actually better than Ceph’s. With Swift, clients requests up to the 60th percentile are served in a short time and the rest are certainly left to timeout. With Ceph on the contrary, all requests would be served with a delay of ca. 20 seconds, probably causing a timeout for all of them.

As also observed previously, it can be noted that Swift has a more greedy behavior than Ceph, showing a very good response time performance up to a certain percentile, but with an unfair treatment of a relevant portion of the remaining samples.

Conclusions

The results for this study have been obtained with a small cluster consisting of only 3 servers, used in a configuration of 1 monitor/proxy and 2 storage nodes. Each storage server has been configured with two different and dedicated HDDs for data.

Consequently, the results here presented, even if very relevant to this context and consistent for the experiment and use case that we wanted to analyze, should not be extended to larger storage deployments that can include hundreds to thousands of OSDs.

The results here analyzed demonstrate an overall better performance of Ceph over Swift, when both systems are used for object storage (with librados for Ceph and the HTTP-based ReSTful interface for Swift).

For read operations, the most remarkable difference in throughput (number of operations per second) has been observed when accessing 20 different pools (or containers), each one containing 1000 objects of small size (4 kB). In this case, Ceph has reached a maximum average bandwidth of ca 7.5 MB/s, while Swift stopped at ca. 3 MB/s. This result, given the small time required to perform a single read operation (due to the size of the objects), would suggest a faster lookup procedure for Ceph than Swift, due to smaller overhead when performing each operation.

The difference in the read throughput decreases when the size of the object increases, to be basically zero up to a certain object size. This pattern has recurred also for write and mixed operations.

For the collection of read statistics, we have to take into account that the networking infrastructure with 1 Gbps links has represented a bottleneck that has limited the read performance of Ceph, starting with an object size of 128 kB and of Swift, starting with an object size of 512 kB (when only one container was used) and 1024 kB (when 20 containers were used).

Concerning write operations, Ceph has shown a better performance than Swift with objects of 4 kB and has produced less throughput when the size of the objects was between 128 kB and 1024 kB. For this configuration, the management of object replicas was similar for both Ceph and Swift and the difference of using a replication factor of 1 instead of 3 has been shown for Ceph and has produced expected results.

In the case of mixed read, write and delete operations, with the proportion of 80%, 15% and 5% respectively, we have observed higher throughput values with Ceph and, as in the previous cases, the difference was more pronounced when operating with smaller objects and almost zero with objects of 5 MB and 10 MB.

The response time and saturation analysis of the systems has denoted a more greedy behavior for Swift and more tendency to fairness for Ceph. While up to a certain percentile, Swift has shown faster responses to client requests than Ceph, an eventually large number of samples were left to starvation, with unacceptably long response times that would have certainly led to timeouts. Ceph on the other hand has manifested higher response times at low percentiles, but a much steeper CDF curve, implying that the majority of samples could receive a response in approximately the same time and providing for a more predictable behavior.

Related posts

Ceph: OSD “down” and “out” of the cluster – An obvious case

When setting up a cluster with ceph-deploy, just after the ceph-deploy osd activate phase and the distribution of keys, the OSDs should be both “up” and “in” the cluster.

One thing that is not mentioned in the quick-install documentation with ceph-deploy or the OSDs monitoring or troubleshooting page (or at least I didn’t find it), is that, upon (re-)boot, mounting the storage volumes to the mount points that ceph-deploy prepares is up to the administrator (check this discussion on the Ceph mailing list).

So, after a reboot of my storage nodes, the Ceph cluster couldn’t reach a healthy state showing the following OSD tree:

$ ceph osd tree
# id weight type name up/down reweight
-1 3.64 root default
    -2 1.82 host ceph-osd0
        0 0.91 osd.0 down 0
        1 0.91 osd.1 down 0
    -3 1.82 host ceph-osd1
        2 0.91 osd.2 down 0
        3 0.91 osd.3 up 1

I wasn’t thinking about mounting the drives, as this process was hidden to me during the initial installation, but a simple mount command would have immediately unveiled the mistery :D.

So, the simple solution was to mount the devices:

sudo mount /dev/sd<XY> /var/lib/ceph/osd/ceph-<K>/

and then to start the OSD daemons:

sudo start ceph-osd id=<K>

For some other troubleshooting hints for Ceph, you may look at this page.