What is active storage about?
In most of the distributed storage systems, the data nodes are decoupled from compute nodes. Disaggregation of storage from the compute servers is motivated by an improved efficiency of storage utilization and a better and mutually independent scalability of computation and storage.
While the above consideration is indisputable, several situations exist where moving computation close to the data brings important benefits. In particular, whenever the stored data is to be processed for analytics purposes, all the data needs to be moved from the storage to the compute cluster (consuming network bandwidth). After some analytics on the data, in most cases the results need to go back to the storage. Another important observation is that large amounts of resources (CPU and memory) are available in the storage infrastructure which usually remain underutilized. Active storage is a research area that studies the effects of moving computation close to data and analyzes the fields of application where data locality actually introduces benefits. In short, active storage allows to run computation tasks where the data is, leveraging storage nodes’ underutilized resources, reducing data movement between storage and compute clusters.
There are many active storage frameworks in the research community. One example of active storage is is the OpenStack Storlets framework, developed by IBM and integrated within OpenStack Swift deployments. IOStack is European funded project, that builds around this concept for object storage. Another example is ZeroVM, which allows developers to push their application to their data instead of having to pull their data to their application.