The widespread adoption and the development of cloud platforms have increased confidence in migrating key business applications to the cloud. New approaches to distributed computing and data analysis have also emerged in conjunction with the growth of cloud computing. Among them, MapReduce and its implementations are probably the most popular and commonly used for data processing on clouds.
Efficient support for distributed computing on cloud platforms means guaranteeing high speed and ultra-low latency to enable massive amounts of uninterrupted data ingestion and real-time analysis, as well as cost-efficiency-at-scale.
Currently, there are limited offerings of on demand distributed computing tools. The main challenge that applies not only to cloud environments, is to build such a framework that handles both big data and fast data. This means that the framework must be able to provide for both batch and stream processing, while allowing clients to transparently define their computations and query the results in real time. Provisioning such a framework on cloud platforms requires delivering rapid provisioning and maximal performance. Challenges also come from one of cloud’s most appealing features: elasticity and auto-scaling. Distributed computing frameworks can greatly benefit from auto-scaling, but current solutions do not support it yet.
Articles and Info
- Blog post: Real-time data processing with Storm