The cloud-native applications initiative started in 2014 with the aim of providing guidelines, recommendations, tools and best practices to companies and practitioners facing the daunting task of developing from scratch or migrating from legacy code an application optimized to be run in the cloud.
In order to gather practical experience, roughly one year ago the CNA seed project was kicked off, focusing explicitly on the migration of a traditional (LAMP stack) legacy application to a cloud-native architecture. In this post we relate on the results achived until now.
You can follow the progress of the CNA seed project going through the list of related blog posts collected on the initiative page (look for the “Articles” section).
One year has gone by and a small team of great people have worked together to scout, test, design, code, deploy, record, and yell at a little code base that we’re happy to share as open source on our github (https://github.com/icclab/cna-seed-project/).
Lots of work has been put in by Sandro, Martin, Özgür, and Giovanni in different phases and with different aims. Christof, Josef, and Thomas were also involved in all high-level planning and design decisions.
In 2015, we finished the implementation of the CoreOS+Fleet solution. This was our initial choice of technologies for container management that would enable us to achieve resilience and scalability.
The video below is a short presentation of what was achieved:
- After careful evaluation, we chose to migrate the Zurmo CRM implementation to a cloud-native architecture;
- Avoiding to delve into the implementation details, we chose to achieve reliability and scalability by using replication at component level;
- We used docker containers to execute each application component separately (e.g., web, mysql galera, memcached servers);
- The implementation is based on CoreOS, Fleet is used as container management solution;
- We use a generic containerized ELK stack for monitoring the application metrics and trigger auto-scaling through our own engine: Dynamite;
The video shows two simple scenarios:
- Scaling up in the face of increased web traffic (application elasticity/scalability)
- Recovery and availability in case of induced failures (application resilience)
In the first scenario we use pods running on our own Kubernetes cluster to generate simulated load (100 users, ~10 requests per second) with the Tsung tool. Dynamite is configured to create new Apache web server containers if the 95th percentile of the response time is above 1 second over a 1 second period. The reaction to the load increase is immediate. The average response time is kept at all times in control.
In the second scenario we show that any virtual machine (or container) might fail and the application will recover without impact on response time. The video shows how we manually terminate a virtual machine during the simulated load and how other containers are immediately restarted while the response time stays in control.
We spent the last weeks of 2015 collecting data for a journal publication that we submitted to the Elsevier journal “Future Generation Computer System” in which we give more details about our experience with this solution, its limits and pitfalls. Keep your fingers crossed for us and an eye on our publication page!
We now have some reservations about our choice of container-management solution and that technology space is moving extremely fast, so although we’ve successfully demonstrated that fleet can be used to control response times under varying load patterns and reacts well in the cast of failure of components, we’re eager to try something new.
Having finally wrapped the Fleet implementation up, our next steps will consist in completing the porting of the application to Kubernetes first and maybe Ubernetes as a next step. Stay tuned!
Özgür and Gio