New Release of DISCO – easier than ever, more powerful than before

Almost one year ago, the first version of DISCO was publicly released. Since then, a major refactoring of DISCO has taken place and we are proud to announce the fresh version with even better usability and a user-friendly dashboard. But first of all, how can DISCO help you? And what is new after the refactoring? We would like to present you the ways how DISCO can make your life as a Big Data analyst much easier. A short wrap-up is presented before the new features are explained more closely.

How can DISCO help me?

DISCO is a framework for the automatic deployment of distributed computing clusters. But not just that, DISCO even provisions the distributed computing software. You can lean back and have the tedious task done by DISCO so that you can focus entirely on the Big Data analysis part.

The new DISCO framework – even more versatile

What is new in the new DISCO edition? To say it shortly: almost everything! Here is a list containing the major new features:

  • Dashboard to hide the command line
  • easy setup for front-end and backend
  • many more Distributed Computing Frameworks
  • hassle-free extensibility with new components
  • automatic dependency handling for components
  • more intuitive commands over CRUD interface (though still no update functionality)

The Dashboard – a face for DISCO

A new dashboard hides the entire background complexity from the end user. Now, everything from planning over deployment to deletion can be done over an intuitive web interface. The dashboard will also provide you with real-time information about the status of the installed frameworks on your computing cluster.

Easy setup

Installing DISCO has never been as easy as it is now! The backend only needs 3 settings to be entered, two of which are not even external settings. And the dashboard? The dashboard comes even with its own installation script – so the most difficult part is cloning the github repository.

New Distributed Computing frameworks

The first version of DISCO could only provision Hadoop. The new release has more, most importantly another major Distributed Computing framework. Here is a list of all supported frameworks right now:

Extensibility made easy

Is there a framework that you would like to provision, but which is not implemented in DISCO yet? This is not a problem anymore! The new system is very easy to extend with new components. You can just write the new component (for instance by copying and modifying an existing component) and drop its directory structure to the other components! There is no installation needed; you can have the new component deployed immediately. DISCO has a built-in functionality which will greatly enhance your provisioning experience – everything is done in parallel on the entire cluster! Just take a look at the Wiki for further reference.

Dependency handling automated

When it comes to dependencies among the frameworks, things can get complicated easily. Unless you are using DISCO. DISCO automatically installs each required component for a smooth provisioning process. You don’t have to bother yourself with questions about which additional components to install. You just select the ones you need access to and DISCO will take care of the rest.

Future work

DISCO did a huge leap forward over the last year. Still, there are some visions what can be done to improve or extend it even beyond its current state. In the future, DISCO will not only provision distributed computing clusters but it will find out on its own what the end user needs for his current task. There will be a recommendation engine, which will propose the best fitting distributed computing frameworks upon a completed questionnaire. Of course, as the world of distributed computing frameworks is always evolving, more components are going to be included on the go. Still, this doesn’t mean that DISCO will get more complicated – on contrary: the Dashboard will make the choice of frameworks and settings easier than ever. We already have many ideas how to provide  an even more fulfilled user experience. Just wait and see the new additions! Don’t forget to check back regularly or to sign up for our mailing list for news! And if there is something that we have missed (or something that you specially like), please contact us – we will happily help you!

DISCO 2.0 release can be downloaded from our git repo here: and extensive documentation is available under the github wiki at, we wish you happy testing!

First public release of DISCO – a new distributed computing orchestration framework

After several months of development, last week was finally the first beta release of the distributed computing orchestration framework DISCO.

What is DISCO anyway?

Have you ever needed a computing cluster for Big Data to be ready in a matter of seconds, with a huge amount of computers at its disposal? If so, then DISCO is for you! DISCO (for DIStributed COmputing) is an abstraction layer for OpenStack‘s orchestration part, Heat (or any other framework which can deploy a Heat orchestration template). Based on the orchestration framework Hurtle developed at our lab, it supervises the whole lifecycle of a distributed computing cluster, from designing to disposal.

How does DISCO work?

As already mentioned, DISCO is a middleman between OpenStack and the end user. It not only takes the troublesome work of designing a whole (virtual) computing cluster but it also deploys a distributed computing architecture of choice onto that cluster – automatically. Continue reading

Lightning Sparks all around: A comprehensive analysis of popular distributed computing frameworks (ABDA’15)

Distributed Computing Frameworks

Big Data processing has been a very current topic for the last ten or so years. In order to process Big Data, special software frameworks have been developed. Nowadays, these frameworks are usually based on distributed computing because horizontal scaling is cheaper than vertical scaling. But horizontal scaling imposes a new set of problems when it comes to programming. A traditional programmer feels safer in a well-known environment that pretends to be a single computer instead of a whole cluster of computers. In order to deal with this problem, several programming and architectural patterns have been developed, most importantly MapReduce and the use of distributed file systems. There are several OpenSource frameworks that implement these patterns. Continue reading