Tag: big data

Call for Contributions: IEEE/ACM UCC and BDCAT 2018, Zurich, Switzerland

by Josef Spillner

Block the dates in your calendar: December 17 to 21 is high cloud time in Switzerland!

Two computer science research laboratories at Zurich University of Applied Sciences, the Service Prototyping Lab and the ICCLab, are jointly going to host the 11th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2018) and the 5th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT 2018) along with a number of satellite and co-located events from December 17 to 21 in Zurich, Switzerland. This pre-christmas conference week with prestigious conferences is a unique opportunity to bring together international researchers and practitioners in central Europe. Please consider supporting the event with corporate donations, tutorials, cloud challenge entries and other contributions. Your chance to demonstrate convincing cloud technology to the world! Contact the conference organisers for any details.

Technical paper submissions are furthermore open to a number of collocated workshops. Among them we would like to point out the 1st Workshop on Quality Assurance in the Context of Cloud Computing (QA3C 2018) and the 1st Workshop on Cloud-Native Applications Design and Experience (CNAX 2018) in which our research staff proudly serves as co-chairs. In total, 9 workshops are accepting papers now, a doctoral forum accepts research proposals, and a cloud challenge supports practical (demo-able) contributions with emphasis on reproducible impactful results.

Finally, we would like to mention specifically the subsequent European Symposium on Serverless Computing and Applications (ESSCA 2018) on December 21st which as a mixed industry-academic-community event acknowledges that FaaS-based applications have become mainstream but challenges remain. Got a talk on that topic? Just propose it informally to enrich the technical meeting with different perspectives. Along with ESSCA, on December 20 there will be the 4th edition of the International Workshop on Serverless Computing as part of UCC.

ICCLab Research Group Activity

Big data is a general term that might be petabyte (10^15 Byte), Exabyte (10^18 byte) or zettabyte (10^21 byte) large and consisting of billions to trillions or quadrillions of records.

Big data can be described as

  • Large volume amount of data a specific company produces ,
  • A data which requires too much time and cost for analysis,
  • A data that takes too much time to load into a relational database,
  • A data that is beyond the limit of processing capacity of specific database system and so on.

Due to the rapid growth of the data volumes, dealing with big data might lead you to the difficulties of being able to store, create, manipulate and manage your data. Generally big data is a problem in business analytics because of the large volume of data storage, process time and cost.

Goal of ICCLab research group

Most of the time big data is related with cloud computing because of the storage plus management and analysis of big data. Big dataset requires a framework like MapReduce to distribute the work among different computers.

Our aim is to solve challenges on storing, accessing and analyzing big data using the infrastructure of Cloud Computing with Hadoop and analytic packages such as SAS and R. Our infrastructure is not only for storing but also big data analytics is a challenge which needs attention!

The benefit of having big data; Even though it has some difficulties to work with big data, it helps to extract more information which could help for further researches.  Having big data allow research groups to have variety of research areas or it enable them to analyze the data in different aspects/dimension. Furthermore, big data can provide more detailed results for better decision making.

Why Hadoop

Now a days people started to put their data into Hadoop because

  • It is an open source storage,
  • Inexpensive and
  • Helps to save more data than before.

Hadoop supports around 4000 of nodes with 4TB of hard disk capacity per node which is a large amount of volume and it’s easily possible to add and remove servers into a Hadoop cluster. Beyond that it can be used without propriety licensing fees.

And it is possible to integrate high performance parallel data processing using MapReduce.

Analytics Lab

Since we are research group, we don’t want to just have big data stored in an organized way; we also need to analyze the data.

Three important points of why we choose SAS:

  • Using the new version of SAS DI studio it is easy to access stored files in Hadoop without too many extra steps; we can use infile statement of SAS language to read and write files to and from Hadoop.
  • It is possible to work with Hadoop hive tables as if they are SAS datasets, so that we can work with any jobs in SAS DI studio using Hive tables.
  • Using SAS Base it is also possible to use the functionality of Hadoop like MapReduce programming, HDFS command execution and pig

Moreover there is upcoming plan; instead of accessing the data from Hadoop for processing in SAS, it is possible to take the advantage of the cluster by sending down the work to Hadoop cluster to be processed since the data is in the cluster. Interesting!