Service Engineering (ICCLab & SPLab)

Testing Alluxio for Memory Speed Computation on Ceph Objects

milt — Fri, 04 Sep 2020 13:38:37 +0000

In a previous blog post, we showed how “bringing the code to the data” can highly improve computation performance through the active storage (also known as computational storage) concept. In our journey in investigating how to best make computation and storage ecosystems interact, in this blog post we analyze a somehow opposite approach of “bringing the data close to the code“. What the two approaches have in common is the possibility to exploit data locality moving away in both cases from the complete disaggregation of computation and storage.

The approach in focus for this blog post, is at the basis of the Alluxio project, which in short is a memory speed distributed storage system. Alluxio enables data analytics workloads to access various storage systems and accelerate data-intensive applications. It manages data in-memory and optionally on secondary storage tiers, such as cheaper SSDs and HDDs, for additional capacity. It achieves high read and write throughput unifying data access to multiple underlying storage systems reducing data duplication among computation workloads. Alluxio lies between computation frameworks or jobs, such as Apache Spark, Apache MapReduce, or Apache Flink, and various kinds of storage systems, such as Amazon S3, OpenStack Swift, GlusterFS, HDFS or Ceph. Data is available locally for repeated accesses to all users of the compute cluster regardless of the compute engine used avoiding redundant copies of data to be present in memory and driving down capacity requirements and thereby costs.

For more details on the components, the architecture and other features please visit the Alluxio homepage. In the rest of the blog post we will present our experience in integrating Alluxio on our Ceph cluster and use a Spark application to demonstrate the obtained performance improvement (the reference analysis and testing we aimed to reproduce can be found here).

The framework used for testing

Fig. 1: Alluxio testing set-up.

Our framework set-up used for testing is drafted in Fig. 1. The underlying storage we used is Ceph (version mimic) and its object storage. The Ceph cluster we used for testing purposes has been deployed over 6 OpenStack VMs, hosting respectively one Ceph monitor, three storage devices running Object Storage Devices (OSDs), one Ceph RADOS Gateway (RGW) node and one administration node. The total storage size of 420GiB was spread over 7 OSD volumes attached to the three OSD nodes. More details about the Ceph cluster deployment can be found in our previous blog post.

For the computation side, Alluxio (v2.3) has been installed on a separate VM where also Spark (v3.0.0) services are running. Two different settings have been tested. The first set-up we have the simple case with one single VM (16vCPUs) dedicated to Alluxio and Spark master and worker nodes (with 40GB of memory for the worker node). In the second set-up, a cluster mode has been tested where two additional Spark and Alluxio worker nodes are configured (with 16vCPUs and 40GB of memory).

An application has been written in Scala to be run over Spark. The simple code performed access to a text file and count operation over the lines in the file. A comparison in terms of execution time was in focus when the file is either accessed directly through the Ceph RGW on the underlying storage system or through Alluxio. In the latter case, the first time the file has been accessed, Alluxio will upload the file from the underlying Ceph storage into the memory. The subsequent file access will then hit the in-memory copy stored in Alluxio. As we will see, this introduces important performance improvements.

Some preliminary results

We considered three different file sizes, i.e., 1GB, 5GB and 10GB. In each test we stored the time to access the file the first and the second time (averaged over 10 runs). The same application is then launched again a second time to measure again the same file access times. The aim of this is to show how the in-memory Alluxio caching introduces benefits also for further applications accessing the same data.

As it can be noticed from the results plotted in Fig. 2 and Fig. 3, besides the first time when Alluxio is accessing the file and storing it in memory before performing the text line count, very large performance improvements are obtained in comparison to direct Ceph access in all cases. The best performance improvements are observed the second time the file is accessed. For the single node analysis presented in Fig. 2, the second time the file is accessed directly on Ceph the measured time is 75 times higher for the 1GB file, 111 and 107 times higher for the 5GB and 10GB file respectively compared to the access over Alluxio. For the cluster mode setup, the overall execution time is much lower in all cases, and the second time the file is accessed directly on Ceph the measured time is 35 times higher for the 1GB file, 57 and 65 times higher for the 5GB and 10GB file respectively compared to the access over Alluxio.

Figure 2: Execution time for counting lines in a text file with a single VM running colocated Master and worker node.

Figure 3: Execution time for counting lines in a text file with a cluster setup.

The obtained results confirm our expectation that disaggregation of computation and storage is probably not always the best approach to follow. In our vision, a smart combination of bringing data to the code and code to the data approaches is the way to go for optimizing performance. Clearly, this depends on the scenarios and the use cases with several policies that could be put in place to best tune the behaviour/performance.

The Scala code snippets

The Scala code to be run in Spark is reported below in case the same tests should be reproduced. The specific case is here configured to access a 10GB file Ceph directly through the RGW.

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import java.io.File
import java.io.PrintWriter
object test extends App
{
val testcase="10gb-file"
val writer = new PrintWriter(new File("Log_"+testcase+"_Ceph_2nd"))
val conf = new SparkConf().setAppName("test").setMaster("spark://10.20.0.120:7077")
val sc = new SparkContext(conf)
var totaltime : Long=0
var firsttime :Long=0
var totalsecondtimes :Long=0
var avgsecondtimes :Long=0
var newtime : Long=0
for( i<- 0 to 10)
   {
    val file = sc.textFile("s3a://my-new-bucket/" + testcase )
    if (i == 0)
        {
         val start=System.nanoTime()
         file.count()
         val end = System.nanoTime()
         firsttime = (end-start)/1000000
         totaltime = totaltime+firsttime
         writer.write("Current  " + firsttime.toString() + "ms Avg_2nd 0ms"  + " Total " + totaltime.toString() + "ms\n")
         }
    else
        {
         val start=System.nanoTime()
         file.count()
         val end = System.nanoTime()
         totalsecondtimes =  totalsecondtimes+(end-start)/1000000
         avgsecondtimes =  totalsecondtimes/(i)
         totaltime = totaltime+(end-start)/1000000
         writer.write("Current  " + ((end-start)/1000000).toString() + "ms Avg_2nd " + avgsecondtimes.toString() + "ms Total " + totaltime.toString() + "ms\n")
         }
   }
writer.close()
}

When accessing the file through Alluxio, the code has a minor difference in configuration. Moreover, since the file is explicitly stored in Alluxio memory the first time it is accessed, some additional code lines are needed.

...
for( i<- 0 to 10)
   {
   val file2 = sc.textFile("alluxio://alluxio:19998/" + testcase + "_cached")
   if (i == 0)
    {
    val file = sc.textFile("alluxio://alluxio:19998/"+ testcase)
    val start=System.nanoTime()
    file.saveAsTextFile("alluxio://alluxio:19998/" + testcase +"_cached")
    file2.count()
    val end = System.nanoTime()
    firsttime = (end-start)/1000000
    totaltime = totaltime+  firsttime
    writer.write("Current  " + firsttime.toString() + "ms Avg_2nd 0ms"  + " Total " + totaltime.toString() + "ms\n")
     }....

The Scala project is compiled using the sbt tool and the resulting jar is then provided to Spark using spark-submit.

sbt assembly
spark-submit ./target/scala-2.11/test-assembly-1.0.jar

Some Spark configuration information

To interface Spark with Ceph the spark-defaults.conf file has to be configured as follows.

spark.driver.extraClassPath /opt/spark/jars/hadoop-aws-3.2.0.jar
spark.executor.extraClassPath /opt/spark/jars/hadoop-aws-3.2.0.jar
spark.driver.extraClassPath /opt/spark/jars/aws-java-sdk-bundle-1.11.375.jar
spark.executor.extraClassPath /opt/spark/jars/aws-java-sdk-bundle-1.11.375.jar
spark.hadoop.fs.s3a.impl  org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.access.key  ########
spark.hadoop.fs.s3a.secret.key  ########
spark.hadoop.fs.s3a.connection.ssl.enabled  false
spark.hadoop.fs.s3a.signing-algorithm S3SignerType
spark.hadoop.fs.s3a.path.style.access true
spark.hadoop.fs.s3a.endpoint    http://10.20.0.156:7480

Note that, to interface with Ceph the needed jars had to be downloaded. For our specific Spark with Hadoop installation the with Hadoop 3.2.0 we needed the following jars:

ubuntu@spark:/opt/spark/jars$ sudo wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
ubuntu@spark:/opt/spark/jars$ sudo wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar

To interface Spark with Alluxio the spark-defaults.conf file had to include the following lines.

spark.driver.extraClassPath /home/ubuntu/alluxio-2.3.0/client/alluxio-2.3.0-client.jar
spark.executor.extraClassPath /home/ubuntu/alluxio-2.3.0/client/alluxio-2.3.0-client.jar

Open issues in testing

Java 8 prerequisite

The current stable version of Alluxio (v2.3) has as a prerequisite a version of Java 8 being installed. This means that if the current default Java version, i.e., Java 11 is installed instead, Alluxio will not work. This aspect should be carefully considered when performing the tests presented in this blog. In particular, testing direct Ceph file access with Spark using Java 11 performs much better when compared to using Java 8. Therefore, for a fair comparison of these results always Java 8 should be used also when Alluxio is not used! Work is currently ongoing for Alluxio to use Java 11. In particular, we were able to test v2.4 which is not still released and as expected the total execution time in all cases is largely reduced as we report in Fig. 4. Although the benefits in using Alluxio are a bit downscaled by the general reduced execution time, still a 6 times less execution is obtained for a 10GB file at the second access compared to direct Ceph access.

Figure 4: Execution time for counting lines in a text file with a cluster setup when using Java 11.

Default in-memory storage in Alluxio

The first time a file is accessed in Alluxio, this is automatically stored in memory after being this uploaded from the understorage. However, we observed that when accessing this (in-memory) file the second time, a limited performance enhancement is actually observed! We then found out, as reported in the Scala snippet, that when we explicitly use the saveAsTextFile command the first time and then access the new stored file in memory the expected performance improvement is observed. The drawback of this approach is some additional time in saving the file in memory is spent and a redundant copy of the file is stored in memory. We are currently performing further testing to find out whether some parameter or configuration tuning is needed to take advantage directly of the default in-memory caching. Indeed, according to the results we used as a reference for this analysis, Alluxio should provide already a performance increase at the first file access. As we observe from Fig. 2, this is not the case in our analysis due to the additional time needed to execute the saveAsTextFile command.

Schlagwörter: Alluxio, Apache Spark, Ceph, Cloud Computing, cloud storage, computing, Scala, Spark

Der Beitrag Testing Alluxio for Memory Speed Computation on Ceph Objects erschien zuerst auf Service Engineering (ICCLab & SPLab).

Experimenting on Ceph Object Classes for Active Storage

milt — Tue, 17 Dec 2019 15:04:17 +0000

What is active storage about?

In most of the distributed storage systems, the data nodes are decoupled from compute nodes. Disaggregation of storage from the compute servers is motivated by an improved efficiency of storage utilization and a better and mutually independent scalability of computation and storage.

While the above consideration is indisputable, several situations exist where moving computation close to the data brings important benefits. In particular, whenever the stored data is to be processed for analytics purposes, all the data needs to be moved from the storage to the compute cluster (consuming network bandwidth). After some analytics on the data, in most cases the results need to go back to the storage. Another important observation is that large amounts of resources (CPU and memory) are available in the storage infrastructure which usually remain underutilized. Active storage is a research area that studies the effects of moving computation close to data and analyzes the fields of application where data locality actually introduces benefits. In short, active storage allows to run computation tasks where the data is, leveraging storage nodes’ underutilized resources, reducing data movement between storage and compute clusters.

There are many active storage frameworks in the research community. One example of active storage is is the OpenStack Storlets framework, developed by IBM and integrated within OpenStack Swift deployments. IOStack is European funded project, that builds around this concept for object storage. Another example is ZeroVM, which allows developers to push their application to their data instead of having to pull their data to their application.

So, what about Ceph?

Ceph is a widespread unified, distributed storage system that offers high performance, reliability, and scalability. Ceph plays a very important role in the open-source storage world. Nonetheless, we had to dig a bit deeper to find out that also Ceph has a feature to actually implement active storage. Object classes are the technology to support this, but probably being not a widely known and adopted Ceph feature. Specifically, object classes allow to extend Ceph by loading custom code directly into OSDs that can then be executed by a librados application. In particular, the created object classes can define methods having the ability to call the native methods in the Ceph object store or other class methods incorporated via libraries (or created yourself). As a further effect, this allows to exploit the distributed scale of Ceph also for computational tasks as parallel computing over OSDs can be achieved. The resulting available compute power is much higher than what a single client could provide!

Although, the Ceph official documentation is not really exhaustive in describing the use of Ceph object classes, a very useful set of examples can be found in the book Mastering Ceph by Nick Fish. Chapters 5 and 6 represent a very good guideline for understanding the basics of building applications that directly interact with a Ceph cluster through librados and building own object classes. Based on the examples presented in the book, in the remaining of this post we will report our experiments with Ceph object classes. Multiple non-obvious configuration steps are to be taken to finally deploy object classes in a Ceph cluster.

Deploying an Object Class on a Ceph cluster

At this stage we assume our testing Ceph cluster is up and running with one monitor and three OSDs. For more details on the Ceph cluster deployment steps, please refer to the specific sections at the bottom of this blog post.

The example we report on here is modelling the case where we want to calculate an MD5 hash of every object in a RADOS pool and store the resulting hash as an attribute of the specific object. A first solution to this problem would be that the client requests the object, performs the computation remotely and then pushes back the attribute in the storage cluster. The second option instead, is to create an object class that allows to read an object from a pool, calculates the MD5 hash and stores it as an attribute to the object. In this second option, the client would just have to send a command to the OSD to execute the object class.

We will show the implementation for both the options described above. The goal is not only to show that the final result is the same, but we will also report on the performance comparison in terms of time needed to reach the solution. For a better comparison we will implement a code that repeats the MD5 hash calculation for 1000 times. From the final comparison of the two solution, we will notice the benefits obtained by using object classes exploiting data locality for computation. Actually, we can’t wait with revealing the result! Specifically, using the Ceph object class on the OSDs, the computation lasts only 0.126s when adopting the active storage concept, instead of 7.735s when computation is made remotely on the client. This results in a 98.4% time saving, which is a very important result! In the next sections we report on the steps performed to obtain these results on our own test Ceph cluster.

Cloning the Ceph git repository

To be able to create object classes in a running Ceph cluster we first need to clone the Ceph git repository. Note, this should be done on one monitor node in the Ceph cluster. Very important is to make sure to clone the git branch corresponding to the deployed Ceph version in the cluster! In our Ceph cluster, the monitor node is node mon1, and the installed Ceph version is mimic:

ceph-admin@mon1:~$ git clone -branch mimic https://github.com/ceph/ceph.git

Next step towards the ability to build Ceph object classes, is to install some required additional packages. To do this, the install-deps.sh script in the Ceph source tree should be run and the build-essential package installed:

ceph-admin@mon1:~/ceph$ ./install-deps.sh 
ceph-admin@mon1:~/ceph$ sudo apt-get install build-essential

Writing an object class

We can now write the object class that reads an object, calculates the MD5 hash and writes it as an attribute of the object itself. Note this class will perform these operations without any client involvement and iterate 1000 times (for a better performance comparison). To this aim we create a C++ source file called cls_md5.cc under a directory md5 we created under the /ceph/src/cls folder. The source code is reported below:

#include 
#include "objclass/objclass.h"

CLS_VER(1,0)
CLS_NAME(md5)

cls_handle_t h_class;
cls_method_handle_t h_calc_md5;

static int calc_md5(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
 {
   char md5string[33];
   for(int i = 0; i < 1000; ++i)
   {
     size_t size;
     int ret = cls_cxx_stat(hctx, &size, NULL);
     if (ret < 0)
       return ret;

     bufferlist data;
     ret = cls_cxx_read(hctx, 0, size, &data);
     if (ret < 0)
       return ret;
     unsigned char md5out[16];
     MD5((unsigned char*)data.c_str(), data.length(), md5out);
     for(int i = 0; i < 16; ++i)
       sprintf(&md5string[i*2], "%02x", (unsigned int)md5out[i]);
     CLS_LOG(0,"Loop:%d - %s",i,md5string);
     bufferlist attrbl;
     attrbl.append(md5string);
     ret = cls_cxx_setxattr(hctx, "MD5", &attrbl);
     if (ret < 0)
     {
        CLS_LOG(0, "Error setting attribute");
        return ret;
     }
   }
   out->append((const char*)md5string, sizeof(md5string));
   return 0;
 }
void __cls_init()
 {
   CLS_LOG(0, "loading cls_md5");
   cls_register("md5", &h_class);
   cls_register_cxx_method(h_class, "calc_md5", CLS_METHOD_RD | CLS_METHOD_WR, calc_md5, &h_calc_md5);
 }

We can now proceed with building the new object class we wrote. It is not necessary to build the whole Ceph git repository, but can limit ourselves to building the cls_md5 class. Before doing this, we need to add to the CMakeLists.txt file (under the /ceph/src/cls folder) a section for the new class we wrote. For the cls_md5 class the section to add is the following:

# cls_md5
set(cls_md5_srcs md5/cls_md5.cc)
add_library(cls_md5 SHARED ${cls_md5_srcs})
set_target_properties(cls_md5 PROPERTIES
    VERSION "1.0.0"
    SOVERSION "1"
    INSTALL_RPATH "")
install(TARGETS cls_md5 DESTINATION ${cls_dir})
target_link_libraries(cls_md5 crypto)
list(APPEND cls_embedded_srcs ${cls_md5_srcs})

Once the file is updated we can use cmake to make the build environment. In our experiment, cmake was not installed by defaults so we had install it first. Running the do_cmake.sh script creates a build directory in the source tree. Inside this directory we can use make to create our new object class cls_md5:

ceph-admin@mon1:~/ceph$ sudo apt install cmake
ceph-admin@mon1:~/ceph$ do_cmake.sh
ceph-admin@mon1:~/ceph/build$ make cls_md5

Once the class is compiled correctly, we will have to copy the class in each of the OSDs in the Ceph cluster under the /usr/lib/rados-classes directory. After this, we need to restart the OSDs in order to load the new class. By default, the OSDs will not be allowed to load any new class. We therefore need to whitelist the new object classes on the OSDs. To do this in each OSD the ceph.conf configuration file should be updated (under the /etc/ceph directory) to include the following lines:

[osd]
osd class load list = *
osd class default list = *

Now we are ready to copy the compiled classes into the OSDs (in our case nodes osd1, osd2 and osd3) and restart the daemons:

ceph-admin@osd1:/usr/lib/rados-classes$ sudo scp ceph-admin@mon1:/home/ceph-admin/ceph/build/lib/libcls_md5.so* .
ceph-admin@osd1:/usr/lib/rados-classes$ sudo systemctl stop ceph-osd.target
ceph-admin@osd1:/usr/lib/rados-classes$ sudo systemctl start ceph-osd.target

To make sure the new classes are loaded correctly, we can have a look at the log file to see whether or not an error occurred. For instance in our osd3 node we will see:

ceph-admin@osd3:~$ sudo cat /var/log/ceph/ceph-osd.3.log | grep cls
2019-12-17 14:16:39.394 7fc6752b5c00  0  /home/ceph-admin/ceph/src/cls/md5/cls_md5.cc:43: loading cls_md5

Writing the librados client applications

We are now ready to write our two librados client applications that either calculate the MD5 hash remotely on the client or call the newly created object class. As expected, the result from the two solutions will be the same, but the computation time is different. Note that both librados applications need to be run on the monitor node in the Ceph cluster.

Specifically, both client applications will act on a pool called rbd (this should be created first using the command: ceph osd pool create rbd 128), on an object called LowerObject where a parameter called MD5 will be added to contain the MD5 hash. The client application that computes the MD5 hash remotely on the client is saved in a file called rados_md5.cc:

#include 
#include 
#include 
#include 
#include 

void exit_func(int ret);

librados::Rados rados;

int main(int argc, const char **argv)
{
  int ret = 0;

  // Define variables
  const char *pool_name = "rbd";
  std::string object_name("LowerObject");
  librados::IoCtx io_ctx;
  
  // Create the Rados object and initialize it 
  {
     ret = rados.init("admin"); // Use the default client.admin keyring
     if (ret < 0) {
        std::cerr << "Failed to initialize rados! error " << ret << std::endl;
        ret = EXIT_FAILURE;
    }
  }

  // Read the ceph config file in its default location
  ret = rados.conf_read_file("/etc/ceph/ceph.conf");
  if (ret < 0) {
    std::cerr << "Failed to parse config file " << "! Error" << ret << std::endl;
    ret = EXIT_FAILURE;
  }

  // Connect to the Ceph cluster
  ret = rados.connect();
  if (ret < 0) {
    std::cerr << "Failed to connect to cluster! Error " << ret << std::endl;
    ret = EXIT_FAILURE;
  } else {
    std::cout << "Connected to the Ceph cluster" << std::endl;
  }

// Create connection to the Rados pool
  ret = rados.ioctx_create(pool_name, io_ctx);
  if (ret < 0) {
    std::cerr << "Failed to connect to pool! Error: " << ret << std::endl;
    ret = EXIT_FAILURE;
  } else {
    std::cout << "Connected to pool: " << pool_name << std::endl; 
  }
  for(int i = 0; i < 1000; ++i)
  {
    size_t size;
    int ret = io_ctx.stat(object_name, &size, NULL);
    if (ret < 0)
      return ret;

    librados::bufferlist data;
    ret = io_ctx.read(object_name, data, size, 0);
    if (ret < 0)
      return ret;
    unsigned char md5out[16];
    MD5((unsigned char*)data.c_str(), data.length(), md5out);
    char md5string[33];
    for(int i = 0; i < 16; ++i)
      sprintf(&md5string[i*2], "%02x", (unsigned int)md5out[i]);
    librados::bufferlist attrbl;
    attrbl.append(md5string);
    ret = io_ctx.setxattr(object_name, "MD5", attrbl);
    if (ret < 0)
    {
       exit_func(1);
    }
  }
exit_func(0);
}

void exit_func(int ret)
{
   // Clean up and exit
   rados.shutdown();
   exit(ret);
}

What this application does, is to create the Rados object, read the configuration file for the Ceph cluster, connect to the Ceph cluster, connected to the rbd Rados pool, read the LowerObject object from
the OSD, calculate the MD5 hash of the object on the client, and write it back as an attribute called MD5 to the object.

The second client application instead computes the MD5 hash on the OSDs using the created object class is saved in a file called rados_class_md5.cc:

#include 
#include 
#include 
#include 

void exit_func(int ret);

librados::Rados rados;

int main(int argc, const char **argv)
{
int ret = 0;
// Define variables
const char *pool_name = "rbd";
std::string object_name("LowerObject");
librados::IoCtx io_ctx;
// Create the Rados object and initialize it
{
ret = rados.init("admin"); // Use the default client.admin keyring
if (ret < 0) {
std::cerr << "Failed to initialize rados! error " << ret << std::endl;
ret = EXIT_FAILURE;
}
}
// Read the ceph config file in its default location
ret = rados.conf_read_file("/etc/ceph/ceph.conf");
if (ret < 0) {
std::cerr << "Failed to parse config file " << "! Error" << ret << std::endl;
ret = EXIT_FAILURE;
}
// Connect to the Ceph cluster
ret = rados.connect();
if (ret < 0) {
std::cerr << "Failed to connect to cluster! Error " << ret << std::endl;
ret = EXIT_FAILURE;
} else {
std::cout << "Connected to the Ceph cluster" << std::endl;
}
// Create connection to the Rados pool
ret = rados.ioctx_create(pool_name, io_ctx);
if (ret < 0) {
std::cerr << "Failed to connect to pool! Error: " << ret << std::endl;
ret = EXIT_FAILURE;
} else {
std::cout << "Connected to pool: " << pool_name << std::endl;
}
librados::bufferlist in, out;
io_ctx.exec(object_name, "md5", "calc_md5", in, out);
exit_func(0);
}
void exit_func(int ret)
{
// Clean up and exit
rados.shutdown();
exit(ret);
}

Also this application creates the Rados object, initializes it reading the configuration file for the Ceph cluster, connects to the Ceph cluster, creates a connection to the rbd Rados pool, and then calls the exec function that triggers the method calc_md5 from the md5 class passing the name of the object LowerObject and two buffers for input and output to the class. It will be the task of the called object class to calculate the MD5 hash and write it to the attribute called MD5 of the object (repeating this for 1000 times).

The two client applications can be compiled using the g++ compiler:

ceph-admin@mon1:~/test_app$ g++ rados_md5.cc -o rados_md5 -lrados -std=c++11
ceph-admin@mon1:~/test_app$ g++ rados_class_md5.cc -o rados_class_md5 -lrados -std=c++11

If the applications compile successfully (i.e. no output given), we are ready to test the applications and compare their performance.

Comparing the client applications performance

To compare the performance of the two client applications for the MD5 hashing computation, we can use the time utility from Linux and measure the time taken.

The first application we test is the one performing the computation remotely on the client, namely the rados_md5 application. Besides checking whether the MD5 hash has been computed and inserted as an attribute to the given object, we are interested in taking note of the computation time:

ceph-admin@mon1:~/test_app$ time sudo ./rados_md5 

Connected to the Ceph cluster
Connected to pool: rbd
real    0m7.735s
user    0m0.274s
sys     0m0.211s

ceph-admin@mon1:~/test_app$ sudo rados -p rbd getxattr LowerObject MD5 
9d40bae4ff2032c9eff59806298a95bd

The second application we test is the one performing the computation directly on the OSDs using the object class we loaded on the Ceph nodes, namely the rados_class_md5 application. Note that we first need to delete the attribute form the object to make sure this is now computed by the object class (they are acting on the same pool, object and attribute).

ceph-admin@mon1:~/test_app$ sudo rados -p rbd rmxattr LowerObject MD5
ceph-admin@mon1:~/test_app$ sudo rados -p rbd getxattr LowerObject MD5 
error getting xattr rbd/LowerObject/MD5: (61) No data available

Also here, besides checking whether the MD5 hash has been computed and inserted as an attribute to the given object, we are interested in taking note of the computation time:

ceph-admin@mon1:~/test_app$ time sudo ./rados_class_md5 
 
Connected to the Ceph cluster
Connected to pool: rbd
real    0m0.126s
user    0m0.042s
sys     0m0.009s

ceph-admin@mon1:~/test_app$ sudo rados -p rbd getxattr LowerObject MD5 
9d40bae4ff2032c9eff59806298a95bd

As we compare the outputs, we notice that the MD5 hash corresponds in the two cases. What is instead the most interesting result is that using the object class on the OSDs, the computation lasts only 0.126s when adopting the active storage concept, instead of 7.735s when computation is performed remotely on the client. This results in a 98.4% time saving, which is a very important result.

VMs preparation for our test Ceph cluster deployment

Although, writing a tutorial on installing a Ceph cluster is not the main scope of this blog post, to give a complete overview of our study we report here a summary of the steps taken and the features of the used machines. Other tutorials are available in the Internet with more details on the steps to take.

For the scope of our tests, we deployed a Ceph cluster on our Openstack framework. We created six VM instances of flavor m1.large (1 vCPU, 2 GB of RAM, 20GB size) so that our Ceph cluster has one monitor, one ceph-admin node, one rgw node and 3 OSDs (osd1, osd2 and osd3). The OSDs have additional volumes attached (a 10GiB disk on osd1, osd2 and osd3, a 15GiB disk on osd3).

For the purpose of building a Ceph cluster, it is important to define security groups with rules that will open access to ports needed by the Ceph nodes. In particular, the ceph-admin node requires port 22, 80, 2003 and 4505-4506 to be open, the monitor node requires 22 and 6789 to be open, whereas the OSD nodes require ports 22 and the port range 6800:7300 to be open. Note that additional ports might need to be opened according to the specific configuration of the Ceph cluster.

All our VMs have Ubuntu Bionic (18.04.3 LTS) installed and have a floating IP associated. The resulting association of hostname and IP address for our study case is as follows:

hostname          IP address
ceph-admin        10.20.3.13
mon1              10.20.1.144
osd1              10.20.3.216
osd2              10.20.3.138
osd3              10.20.1.21 
rgw               10.20.3.95

On each node we create a user named ‘ceph-admin’ and configure it for passwordless sudo privileges on all nodes. Further, on all machines we installed python and python-pip packages and updated the hosts configuration file with the list of hostnames and the corresponding IP address.

We used the ceph-admin node for configuring the Ceph cluster. To this aim this node needs to have privileges for passwordless SSH access for user ‘ceph-admin’ to all nodes. We therefore, generated the ssh keys for ‘ceph-admin’ user on the ceph-admin node launching the ssh-keygen command (leaving the passphrase blank/empty). We then edited the ssh configuration file (the ~/.ssh/config file) as follows:

Host osd1
   Hostname osd1
   User ceph-admin
Host osd2
   Hostname osd2
   User ceph-admin
Host osd3
   Hostname osd3
   User ceph-admin
Host mon1
   Hostname mon1
   User ceph-admin
Host mon2
   Hostname mon2
   User ceph-admin
Host rgw
   Hostname rgw
   User ceph-admin

Further steps to finalize the configuration are:

run chmod 644 ~/.ssh/config on the ceph-admin node
run ssh-keyscan osd1 osd2 osd3 mon1 rgw >> ~/.ssh/known_hosts
Access on each vm as root over ssh and edit the ssh configuration file: sudo nano /etc/ssh/sshd_config
Change PasswordAuthentication to yes, and restart daemon: sudo systemctl restart sshd
On the ceph-admin node (typing ceph-admin password when requested): ssh-copy-id osd1 – ssh-copy-id osd2 – ssh-copy-id osd3 – ssh-copy-id mon1 – ssh-copy-id rgw

Next step is to configure a firewall on the Ubuntu servers to protect the system leaving some specific ports open: 80, 2003 and 4505-4506 on the ceph-admin node, 22, 80 and 6789 on the mon1 node and 22, 6800-7300 on the osd1, osd2 and osd3 nodes. For instance for osd1 node the commands to launch are:

ceph-admin@ceph-admin:~$ ssh osd1
ceph-admin@osd1:~$ sudo apt-get install -y ufw
ceph-admin@osd1:~$ sudo ufw allow 22/tcp
ceph-admin@osd1:~$ sudo ufw allow 6800:7300/tcp
ceph-admin@osd1:~$ sudo ufw enable

To configure the additional volumes available on the OSD nodes we login to the OSD nodes and format the partition with a XFS filesystem:

ceph-admin@osd3:~$ sudo parted -s /dev/vdb mklabel gpt mkpart primary xfs 0% 100%
ceph-admin@osd3:~$ sudo mkfs.xfs -f /dev/vdb

Deploying the Ceph cluster

Once all the machines are configured, we are ready to deploy our Ceph cluster using ceph-deploy. On the ceph-admin node we install ceph-deploy with the following commands:

ceph-admin@ceph-admin:~$ wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
ceph-admin@ceph-admin:~$ echo deb https://download.ceph.com/debian-luminous/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list 
ceph-admin@ceph-admin:~$ sudo apt update 
ceph-admin@ceph-admin:~$ sudo apt install ceph-deploy

In a given directory (e.g., we created a directory ceph-deploy) on the ceph-admin node we will run the command to define the cluster nodes:

ceph-admin@ceph-admin:~$ mkdir ceph-deploy
ceph-admin@ceph-admin:~$ cd ceph-deploy/
ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy new mon1

This command generates the Ceph cluster configuration file ‘ceph.conf‘ in the current directory. The ceph.conf file can be edited to add the public network details under the [global] block. The resulting ceph.conf file looks like this:

[global]
fsid = 44d61b90-a1de-459f-97c6-6d9642eb5e0f
mon_initial_members = mon1
mon_host = 10.20.1.144
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.20.0.0/16

The next steps to take now are: i) installing Ceph on all nodes from the ceph-admin node, ii) deploying the monitor node on node mon1, iii) deploying the management-key to all associated nodes, and iv) deploy a manager daemon on the monitor node:

ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy install ceph-admin mon1 osd1 osd2 osd3 rgw
ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy mon create-initial 
ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy admin ceph-admin mon1 osd1 osd2 osd3 rgw
ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy mgr create mon1

To be able to use the ceph CLI on all nodes without need to specify the monitor address and the admin key we should also change the permissions of the key file on all nodes:

sudo chmod 644
/etc/ceph/ceph.client.admin.keyring

Finally, we can add the OSD daemons on the nodes:

ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy osd create --data /dev/vdb osd1 
ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy osd create --data /dev/vdb osd2 
ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy osd create --data /dev/vdb osd3
ceph-admin@ceph-admin:~/ceph-deploy$ ceph-deploy osd create --data /dev/vdc osd3

These steps made us have a Ceph cluster we could use for testing the object classes for active storage. Further more detailed descriptions on adding an rgw to the cluster, enabling the Ceph dashboard, storing/retrieving object data, resetting the Ceph cluster, adding/removing additional nodes, monitoring the cluster status and so on, are not in the scope of this blog post. The interested reader could refer to the official Ceph documentation or to good reference tutorials: i) Deploy Ceph and start using it: end to end tutorial – Installation, ii) How To Install Ceph Storage Cluster on Ubuntu 18.04 LTS, iii) How to install a Ceph Storage Cluster on Ubuntu 16.04. Note that some information in the links above might be outdated!

Schlagwörter: active storage, Ceph, Cloud Computing, cloud native storage, cloud storage

Der Beitrag Experimenting on Ceph Object Classes for Active Storage erschien zuerst auf Service Engineering (ICCLab & SPLab).

Fourth Robotics and ROS in Zurich Meetup

Lukasz Janasz — Mon, 25 Nov 2019 08:44:42 +0000

On Wednesday’s evening, 16^th of October we attended a 4^th Robotics and ROS in Zurich meetup. I was our first meetup after the summer break. This time, it was co-organized by ICCLab and ANYbotics. ANYbotics provided their office space for the meeting and demo. We managed to gather together around 50 interested attendees.

Everything started with a short introduction talk and agenda presentation by Giovanni Toffetti. After that, the ANYbotics team gave a 30min long presentation about their flagship robot – ANYmal C.

ANYmal C legged robot presentation

Peter Frankhauser, the CBDO of ANYbotics, started by explaining the current state of the art of mobile robots and presenting possible paths where the robotic industry is going to expand in the future. He showed that by analyzing the history of the mobile robotics industry we can spot a transition from robots performing repetitive tasks (automation), started by the third industrial revolution in late 60s, to autonomous robots used to operate in large storehouses (logistics). The autonomy of logistic robots is somehow limited only to controlled environments, so by drawing a trend line, we can spot the next breakthrough waiting to be reached by mobile robots: the possibility to operate autonomously in “robot unfriendly” environment. The use-case presented by ANYbotics team is the task of routine inspections in places like offshore power plants or oil platforms. Later on, two engineers from ANYbotics, took over the presentation and dived more deeply into technical details about how they managed to adapt their robot to harsh environment and what kind of challenges they faced.

The next part of meetup were Lightning Talks. Four presenters did a 3 min presentations about the projects they are working on. First presentation was done by Jasmine Kent, the CTO of Dufour Aerospace. Her company’s mission is to create a vertical take-off and landing (VTOL) autonomous plane, electrically powered.

Jasmine Kent presenting Dufour Aerospace concept UAV

Dufour Aerospace is using ROS to control the first 50kg prototype of the aircraft. The company’s long term goal is to create an aircraft which could be used as daily traveling medium and also to be capable of perform a rescue missions.

The second short presentation was from Mateusz Sadowski, a robotics consultant from Geneva who is also the creator of weeklyrobotics.com website, which aggregates interesting news about the robotic world.

Mateusz during his presentation

In the presentation he focused on promoting his website, as an example of interesting project Mat described in weeklyrobotics was NASA’s Astrobee robot.

After that, Behrooz Tahanzadeh took the scene with his presentation about the Bengesht plugin, acting as a bridge between the visual programming language -Grasshopper and ROS. Recently Behrooz also wrote a blogpost about our meeting, check it out!
The last lightning talk was from Giovanni.

Lightning talk from Giovanni

He focused on the one of the projects from ICCLab – Cloud Robotics. He presented the list of challenges the cloud systems need to face while providing resources to mobile robots and how he and his team managed to overcome the problems in their Cloud Robotics Paas (Platform-as-a-Service). The platform which takes care about robot configuration, orchestration and provides computational resources to the robots. Thus allows Robotics Application Developers to focus on providing actual application features for the robots, without being distracted by taking care of underlying infrastructure and robot resources. Usage of cloud tools and development processes gives high modularity, portability and gives self-healing capabilities to the software. Currently ICCLab is looking for opportunities to collaborate in research projects by providing its experience in ROS, robotics and cloud computing.

The ANYmal C robot demo was the last part of a meetup, during which the ANYbotics team showed the latest iteration of the robot. The robot was walking, crawling and even dancing to the music. Engineers presented their open-source products, one of the very prominent applications is elevation_mapping package, that helps robot to plan its movements while climbing a stairs. After the interactive part of the demo ANYbotics team took a long time to answer the myriads of questions regarding the technical details of ANYmal robot asked by a meetup attendees.

ANYmal C prepared for the demo

We would like to thank ANYbotics for hosting and sponsoring this meetup and all of the attendees for coming. Next meetup is yet to be planned. Please follow our meetup group to be up to date with the next event announcements.

Schlagwörter: cloud robotics, meetup, robotics, ROS

Der Beitrag Fourth Robotics and ROS in Zurich Meetup erschien zuerst auf Service Engineering (ICCLab & SPLab).

Our recent paper on Cloud Native Storage presented at EuCNC 2019

milt — Mon, 08 Jul 2019 09:47:09 +0000

In June we could participe to the 28^th edition of EuCNC, an international conference sponsored by the IEEE Communications Society, the European Association for Signal Processing, and supported by the European Commission. EuCNC is one of the most prominent communications and networking conferences in Europe, which efficiently brings together cutting-edge research and world-renown industries and businesses.

Valencia Congress Center

This year the conference was held in beautiful Valencia, Spain. The focus of the event was on various aspects of 5G communications systems and networks, including also cloud and virtualisation solutions which were of particular interest for us, management technologies, and vertical application areas. The goal of the conference is to bring together researchers from all over the world to present the latest research results especially from successive European R&D programmes co-financed by the European Commission. The event spanned over 4 days with several workshops, exhibitions and demos, technical sessions and discussion panels, as well as social events for networking and peering with colleagues from companies and research institutes.

Opening Keynotes

At EuCNC we could present our recent work on Cloud Native Storage with a paper entitled “Monitoring Resilience in a Rook-managed Containerized Cloud Storage System”. In this paper, we focused on containerized cloud storage, proposing a resilience monitoring solution for the recently developed Rook storage operator. While, Rook brings storage systems into a cloud native container platform, in this paper we presented a software tool that is able to test and monitor a Ceph cluster deployed using Rook on Kubernetes. Our proposed module is validated in a production environment, with software components generating a constant load and a controlled removal of system elements to evaluate the self-healing capability of the storage system. To effectively support resilience in the storage cluster a cautious monitoring of system metrics and the forecasting of failure threats that trigger alert messages is proposed. As failures correspond to revenue losses for the cloud storage providers, we believe that the proposed solution can be of high interest for the rapidly growing cloud native storage world. The complete paper can be found at this link.

Gala dinner at City of Arts and Sciences – Hemisfèric

City of Arts and Sciences – Hemisfèric

Schlagwörter: cloud native storage, cloud storage, resilience

Der Beitrag Our recent paper on Cloud Native Storage presented at EuCNC 2019 erschien zuerst auf Service Engineering (ICCLab & SPLab).

Third Robotics and ROS in Zürich Meetup

Dimitrios Dimopoulos — Mon, 24 Jun 2019 09:38:50 +0000

The third robotics and ROS in Zürich meetup was organized and hosted by ICCLab on June 19th 2019. There was a good turnout from representatives in both academia and industry, totaling about 25 people in attendance. For our third meetup we had two presentations: “Perception and action planning in complex environments with ROS” by Rastislav Marko and Martin Möller from F&P Personal Robotics and “Self-calibrating camera position and grasping with Niryo arm” by Dimitrios Dimopoulos from ICCLab, ZHAW.

Summary of presentation #1: Perception and action planning in complex environments with ROS by Marko Rastislav and Martin Möller from F&P Personal Robotics

Martin Möller opened the first talk with a brief introduction of the company and one of its collaborations with ZHAW in the past. Next an overview of the hardware components of P-Rob 2R was given. This robot lies at the heart of the company’s solutions, including Lio, a mobile service robot, which was the focus of the talk. Following that, myP, the core robot library with its accompanying graphical control interface, was presented in action and a quick look at its architecture and configuration was showcased. Finally, Martin mentioned the key sensor components used by the mobile service robot.

Martin Möller presenting some of the company’s robotic solutions

Next, Rastislav Marko gave a rundown of the ROS components utilized for navigation and high level perception (face/speech/object recognition, body posture estimation etc.). Furthermore some ROS monitoring tools were presented along with the overall system architecture. Lastly, a case study was presented of Lio in a real hospital environment.

Rastislav Marko presenting the overall system architecture

Summary of presentation #2: Self-calibrating camera position and grasping with Niryo arm by Dimitrios Dimopoulos from ICCLab, ZHAW

For the second part of the meetup, Dimitrios Dimopoulos gave a brief presentation on an ongoing grasping project at ICCLab. The lab’s two grasping robots were showcased along with a brief overview of the simulation environments used for evaluation. Next, the grasping chain was presented including the MoveIt library for motion planning and collision detection, and the Grasp Pose Detection library used for the grasp generation. A more detailed description of the steps necessary for grasping was given next, including camera calibration using Aruco markers, pointcloud filtering for object extraction, object mesh creation, grasp generation and motion planning. Finally, a short demonstration of grasping using the Niryo arm was given.

Dimitrios Dimopoulos on the grapsing chain architecture

Closing Remarks

The event was closed with a networking session over drinks and finger food. Many thanks to F&P Robotics for sponsoring this part of the meetup.

All future events and updates can be followed by becoming a member of the Robotics and ROS in Zürich group. This was the last meetup before summer break. We’ll be back with more ROS goodies from (late) September. We look forward to continue meeting ROS-users/roboticists within the area.

Schlagwörter: grasping, meetup, robotics, ROS

Der Beitrag Third Robotics and ROS in Zürich Meetup erschien zuerst auf Service Engineering (ICCLab & SPLab).

Second Robotics and ROS in Zürich Meetup

Dimitrios Dimopoulos — Thu, 16 May 2019 13:06:55 +0000

The second robotics and ROS meetup in Zürich was organized by ICCLab and hosted by Dr. Romana Rust and Gonzalo Casas from Gramazio Kohler Research, ETH Zürich, on May 14th 2019. There was a good turnout from representatives in both academia and industry, totaling about 45 people in attendance. For this second meetup we had three presentations: “ROS for Digital Fabrication in Architecture”, “ROS Integration into Magic Leap” and “Next Generation Security” from Wecorp.

Summary of presentation #1: ROS for Digital Fabrication in Architecture by Dr. Romana Rust and Gonzalo Casas from ETH Gramazio Kohler Research group

Dr. Romana Rust opened the first talk by showcasing ongoing and past projects of the Gramazio Kohler Research group. Specifically she presented the usage of industrial grade robots and ROS in additive digital fabrication and the ways they allow for a novel approach in building non-standardized architectural components.

Dr. Romana Rust on digital fabrication

Gonzalo Casas took over next to give us some insight into COMPAS. The COMPAS library allows researchers to focus on the computational aspects of a fabrication project regardless of the CAD tools used (COMPAS supports the most widely used ones). As explained in the talk, this facilitates collaboration between different groups which might use different CAD tools. Furthermore, the system architecture overview was given followed by a brief demo of the library in action. The slides can be found here.

Gonzalo Casas presenting the Compas framework

Summary of demo: ROS integration into Magic Leap by Daniela Mitterberger from ETH Gramazio Kohler Research group

Next, Daniela Mitterberger gave the audience a look into the usage of Augmented Reality (Magic Leap) in digital fabrication and architecture. In the following slides the integration of ROS with Magic Leap and the overall communication architecture was presented, followed by a video showcasing the application of the system on an actual project. The system aims to assist real workers in the construction of complex components by providing augmented reality guidance through their headset without being an impediment in their mobility.

Daniela Mitterberger showcasing an augmented bricklaying project

Summary of talk #2: Next Generation Security by Dr Max Werner, Samuel Garcin, Fernando Acero from WECORP

Dr Max Werner, Samuel Garcin, Fernando Acero from Wecorp came all the way from London to give the final talk of the evening. A brief look into the drone market landscape was given, followed by the company’s smart alarm solution using drones. This solution allows emergency services to respond faster to incidents and leads to lower false alarm rates. Two of Wecorp’s control engineers took over next to present a study case of a “Follow Me” functionality, where a person commands a drone to follow him. The ROS communication architecture was presented, accompanied by mentions of high-level algorithmic design and the sensors used on the drone. The presentation’s slides can be found here.

Dr. Max Werner, Fernando Acero and Samuel Garcin (from right to left) answering questions after their talk.

Closing Remarks

The event was closed with a networking session over drinks and finger food. Many thanks to Wecorp for sponsoring this part of the meetup.

All future events and updates can be followed by becoming a member of the Robotics and ROS in Zürich group. We look forward to continue meeting ROS-users/roboticists within the area.

Schlagwörter: architecture, meetup, robotics, ROS

Der Beitrag Second Robotics and ROS in Zürich Meetup erschien zuerst auf Service Engineering (ICCLab & SPLab).

Niryo Arm Motor Troubleshooting

Dimitrios Dimopoulos — Wed, 24 Apr 2019 11:29:58 +0000

In most development processes hiccups are unavoidable. Our grasping application using the Niryo One arm was no exception. During testing, we had two of our arms break down and with this post, we would like to share our experiences with debugging and resolving these issues.

As far as we can understand, the axis 6 motor (Dynamixel XL-320 model) in the first arm, which is responsible for turning the gripper around, was damaged due to the gripper hitting the table. Since the gripper does not have an applied force feedback shutdown procedure, one of the motors probably broke down from overloading. Note that there is no gripper URDF model provided and octomap integration into the project was not yet complete at the time, so the kinematics planner was not aware of the table’s existence. As for our second arm, the culprit was the power adapter. The Dynamixel XL-430 motors are rated for 11.1 Volts, but the adapter supplied is a 12V one, which can cause permanent damage due to overheating if the arm is operating for prolonged periods of time. This design oversight was amended in Niryo One models shipped after November 2018, but in any case, you should check the rating of the power adapter provided and request a replacement if needed.

First things first, we need to figure out which motors are damaged. To do so we need to make sure a connection is established between our local machine and the Niryo arm. Now it’s a good time to pull the latest niryo_one_ros repository. After that, we launch the Niryo One Studio program, which is a user-friendly interface for controlling and debugging the arm. Clicking on the “Debug” tab on the left will bring up the “Motor debug options” panel. We uncheck all but one motor each time and click on “Change config and reboot”, which will reboot the arm with only the checked motor enabled. If no error messages comes up, then that motor is fine and we can move down the list. In our case, the axis 6 motor was identified as faulty. Finally, re-enable all the motors in the checklist and reboot one last time.

The next step is to replace the motor. Niryo provides assembly videos and by following those we can disassemble a particular joint of interest. Since we didn’t have a spare Dynamixel XL-320 in our disposal we removed one from the Gripper 2 provided with the robot. Every motor comes preconfigured with its own unique ID that signifies which part of the robot it actuates. These IDs are specified in the niryo_one_bringup/config/v2/niryo_one_motors.yaml file. As we can see the arm expects an ID = 6 but our replacement servo has ID = 12.

To alter the ID of the new gripper we’ll use the dxl_debug_tools program. First, we need to ssh to the robot and make sure the package has been compiled by running:

cd ~/catkin_ws
catkin_make -j2
cd ~/catkin_ws/devel/lib/niryo_one_debug
./dxl_debug_tools --help

Next, we have to bring down the Niryo ROS stack by executing in the terminal:

sudo systemctl stop niryo_one_ros.service

Now we can run the program to set the grippers EEPROM values as necessary.

To change our ID from 12 to 6 we just have to run:

./dxl_debug_tools --id 12 --set-register 3 6 1

We can now bring up the arm again by running roslaunch niryo_one_bringup rpi_setup.launch or just switching the robot off and then on. And now we are finally back in business and ready to run more tests!

Schlagwörter: niryo, robotics, ROS, troubleshooting

Der Beitrag Niryo Arm Motor Troubleshooting erschien zuerst auf Service Engineering (ICCLab & SPLab).

First Robotics and ROS in Zürich Meetup

Rodrigue de Schaetzen — Fri, 12 Apr 2019 09:29:49 +0000

ICCLab organized the first robotics and ROS meetup in Zürich on April 9th 2019. There was a good turnout from representatives in both academia and industry, totaling almost 60 people in attendance. This meetup is the first of hopefully many that we intend to organize, as part of our effort to build a local network of ROS users across many robotic disciplines. Besides networking, our goal for these meetups is to also provide a platform for individuals to share and teach specific robotics/ROS knowledge. For this initial meetup we had two presentations: vision for navigation in autonomous robotics, and ROS applications at ICCLab.

Nearly 60 attendees were present at the first robotics and ROS in Zürich meetup.

Summary of presentation #1: vision for navigation in autonomous robotics by Julian Kent from Auterion

Auterion started their talk with an introduction to navigation algorithms. The example they provided was the A* path-finding algorithm used for obstacle avoidance in many autonomous navigation systems. In the following slides they displayed the complex configuration between non-ROS and ROS components of their drone communication architecture. The presentation ends with a list of advantages and challenges of using ROS in industry, and future steps Auterion intends to take with their drone solution. The complete slideshow can be found here.

Julian Kent from Auterion answers audience questions at the end of his talk.

Summary of presentation #2: ROS applications at ICCLab by Giovanni Toffetti from ICCLab

The second talk at the meetup was presented by us and gave a rundown of our current and past ROS/robotics projects. Focus was on our past InnoSuisse project with Rapyuta Robotics which led to the development of an Enterprise Cloud Robotics Platform (ECRP). A live demo using this platform was given at the end of the talk, which showed dynamic orchestration of ROS packages deployed on a TurtleBot2 and on the cloud. A video of the sample application is available at this link. We also discussed our current work in autonomous navigation with a Summit-XL Steel, and arm-control/grasping with a UR5 arm. Slides are here.

Giovanni gives a high level view of the ROS architecture of a cloud robotics application.

Final thoughts

The meetup concluded with networking over some drinks and finger food. Thanks again to Auterion for sponsoring this part of the event.

From both the verbal and written feedback we have received, we are confident in saying that this initial meetup was a success and prompts the continuation of these sort of meetings. All future events and updates can be followed by becoming a member of the Robotics and ROS in Zürich group; the next meetup details have already been released! We look forward to continue meeting ROS-users/roboticists within the area.

Schlagwörter: meetup, ROS, zurich

Der Beitrag First Robotics and ROS in Zürich Meetup erschien zuerst auf Service Engineering (ICCLab & SPLab).

Running the ICCLab ROS Kinetic environment on your own laptop

milt — Mon, 08 Apr 2019 10:05:31 +0000

As we are making progress on the development of robotic applications in our lab, we experience benefits from providing an easy-to-deploy common ROS Kinetic environment for our developers so that there is no initial setup time needed before starting working on the real code. At the same time, any interested users that would like to test and navigate our code implementations could do this with a few commands. One git clone command is now enough to download our up-to-date repository to your local computer and run our ROS kinetic environment including a workspace with the current ROS projects.

To reach this goal we created a container that includes the ROS Kinetic distribution, all needed dependencies and software packages needed for our projects. No additional installation or configuration steps are needed before testing our applications. The git repository of reference can be found at this link: https://github.com/icclab/rosdocked-irlab

After cloning the repository on your laptop, you can run the ROS kinetic environment including the workspace and projects with these two simple commands:

cd workspace_included
./run-with-dev.sh

This will pull the robopaas/rosdocked-kinetic-workspace-included container to your laptop and start it with access to your X server.

The two projects you can test

Once you are inside the container you will have everything that is needed to test and play around with the two projects we are currently working on, namely robot navigation and pick&place. Both of the projects are based on the hardware we recently acquired. The hardware is our SUMMIT-XL Steel from Robotnik, equipped with a Universal Robots UR5 arm and a Schunk Co-act EGP-C 40 gripper (see a picture of the hardware below). Besides this, we mounted a Intel Realsense D435 camera on the UR5 arm and two Scanse Sweep LIDARs on 3D-printed mounts. Please have a look at our previous blog post for more details about the robot setup and configuration.

Summit_xl, with UR5 arm and Schunk gripper

robot navigation project

You can test our robot navigation project by launching a single launch file from the icclab_summit_xl project in the container:

roslaunch icclab_summit_xl irlab_sim_summit_xls_amcl.launch

A Gazebo simulation environment will be started with an indoor simulated scenario where the Summit_xl robot can be moved around. Additionally Rviz will be launched for visualization of the Gazebo data (see picture below).

By selecting the 2D Nav Goal top bar option in Rviz it is possible to give a navigation goal on the map in Rviz. The robot will start planning a path towards the goal, avoiding obstacles thanks to the environment sensing based on the LIDAR scans. If a viable path is found, the robot will move accordingly.

Pick&Place project

You can test our pick&place application by calling another launch file from the icclab_summit_xl project which is part of the workspace in the container:

roslaunch icclab_summit_xl irlab_sim_summit_xls_grasping.launch

Also in this case a Gazebo simulation environment will be started, with an empty world scenario with the Summit_xl robot and a sample object to be grasped placed in front of the robot (being the deployed gripper opening as small as 1.8cm the selected object is pretty small). Also Rviz will be launched for visualization of the Gazebo data (see picture below) with Moveit being configured for the arm movement.

As visible from the Rviz visualization picture above, an octomap is configured for collision avoidance in the arm movements. The octomap is built based on the pointcloud received from the camera mounted on the arm. A first simple test to see the UR5 arm moving, is to define a goal for the end-effector of the arm and make moveit plan a possible path. If a plan is found it can be executed and see the resulting arm movement.

To test our own python scripts for the pick&place application, you can run the following commands within the container:

cd catkin_ws/src/icclab_summit_xl/scripts
python pick_and_place_summit_simulation.py

The python script will move the arm towards an initial position so that the object to be grasped can be seen with the front and the arm-mounted cameras. A pointcloud will be built based on the pointcloud from both cameras. Based on the resulting pointcloud, the object to grasp will be identified and a number of possible poses will be found for the gripper to grasp the object. Then moveit will look for a collision-free movement plan to grasp the object. If all of these steps are successfully executed, the object will be grasped and a new movement plan will be computed for placing the object on top of the robot (note that this last step might require some more time as we are adding orientation constraints to the object placement). You can watch a video of our pick&place simulation you can perform with our project below.

Video von YouTube - bitte beachten Sie: Mit dem Abspielen kann YouTube Ihr Surf-Verhalten analysieren.

As stated earlier, our default simulation setup follows our acquired hardware and uses, therefore, a Schunk gripper. However, you can simulate also a Robotiq gripper for the given robot configuration by changing a parameter when launching the project and by using a second python script as reported below:

roslaunch icclab_summit_xl irlab_sim_summit_xls_grasping.launch robotiq_gripper:=true
cd catkin_ws/src/icclab_summit_xl/scripts
python pick_and_place_summit_simulation_robotiq.py

Schlagwörter: autonomous driving, grasping, iccla, navigation stack, robotics, ROS, Summit

Der Beitrag Running the ICCLab ROS Kinetic environment on your own laptop erschien zuerst auf Service Engineering (ICCLab & SPLab).

Configuring the ROS Navigation Stack on a new robot

Rodrigue de Schaetzen — Tue, 12 Mar 2019 15:20:29 +0000

Our lab has acquired a new robot as part of its ROS based robotic fleet. We opted with the SUMMIT-XL Steel from Robotnik; a behemoth compared to our much-loved TurtleBots.

The Summit-XL Steel is advertised to be a great platform for robotic application that require transporting heavy loads (up to 250 kg) such as warehouse automation (retrieved from https://www.robotnik.eu/web/wp-content/uploads//2018/07/Robotnik_SUMMIT-XL-STEEL-01.jpg).

The first vital step for any mobile robot is to setup the ROS navigation stack: the piece of software that gives the robot the ability to autonomously navigate through an environment using data from different sensors.

A major component of the stack is the ROS node move_base which provides implementation for the costmaps and planners. A costmap is a grid map where each cell is assigned a specific value or cost: higher costs indicate a smaller distance between the robot and an obstacle. Path-finding is done by a planner which uses a series of different algorithms to find the shortest path while avoiding obstacles. Optimization of autonomous driving at close proximity is done by the local costmap and local planner whereas the full path is optimized by the global costmap and global planner. Together these components find the most optimized path given a navigational goal in the real world.

Most of the configuration process is spent tuning parameters in YAML files; however, this process is time consuming and possibly frustrating if a structured approach is not taken and time is not spent reading into details of how the stack works. Many helpful tuning guides are already available: Basic Navigation Tuning Guide and ROS Navigation Tuning Guide to name a few (we encourage anyone new to the stack to thoroughly read these). Hence, this post will aim to give solutions to some less-discussed-problems.

Configuring a 2-LIDAR setup

To give the robot a full 360 degree view of its surroundings we initially mounted two Scanse Sweep LIDARs on 3D-printed mounts. One of them recently broke and was replaced with a LDS-01 laser scanner from one of our TurtleBot3s. Each laser scanner provides 270 degrees of range data as shown in the diagram below. Apart from the LIDARs, the robot came equipped with a front depth camera.

The front and back laser scanner are located at opposite edges of the robot and each provide a field of view of 270 degrees (retrieved from https://www.roscomponents.com/815-thickbox_default/summit-xl-steel.jpg).

Rear Sweep Scanse LIDAR on a 3D-printed mount (in blue).

The biggest challenge in setting up our own LIDARs was aligning all three range sensors: front orbbec astra pro 3d camera, front LIDAR, and rear LIDAR. The key here is to make precise coordinate measurements of the mounted laser scanner with respect to the origin of the robot i.e. the base_link frame.

Screenshot of RViz showing alignment of all three range sensors on the robot: blue points = rear LIDAR, orange points = front LIDAR, and white points = front depth camera.

Merging laser scans

Both AMCL and GMapping require as input a single LaserScan type message with a single frame which is problematic with a 2-LIDAR setup such as ours.

To solve this issue we used the laserscan_multi_merger node from ira_laser_tools to merge the LaserScan topics to scan_combined and to the frame base_link. The relay ROS node would be insufficient for this issue since it just creates a topic that alternately publishes messages from both incoming LaserScan messages.

Note there is a known bug with the laserscan_multi_merger node which sometimes prevents it from subscribing to the specified topics when the node is brought up at the same time as the LIDARs (i.e. same launch file). A simple fix we found is to use the ROS package timed_roslaunch which can delay the bring-up of the laserscan_multi_merger node by a configurable time interval.

Clearing 2D obstacle layer (`ObstacleLayer`) on costmap

“Ghost obstacles” (as the online community likes to call them) are points on the costmap that indicate no-longer-existing obstacles. This issue was seen with the Scanse Sweep LIDARs and is prevalent among cheaper laser scanners. Many possible reasons exist as to why this occurs, although the most probable cause has to do with the costmap parameter raytrace_range (definition provided below) and the max_range of the LIDAR.

raytrace_range –The maximum range in meters at which to raytrace out obstacles from the map using sensor data.
source – http://wiki.ros.org/costmap_2d/hydro/obstacles

Here’s a simple example to clarify the issue at hand. An obstacle appears in the line of sight of the laser scanner and is marked on the costmap at a distance of 2 meters. The obstacle then disappears and the laser scanner returns a distance of 6 meters at the original radial position of the obstacle. Ray tracing, which is set to a max distance of 3 meters, is unable to clear these points and thus the costmap now contains ghost obstacles. From this example it is clear the parameter raytrace_range needs to be set to a slightly higher value than the maximum valid measurement returned by the laser scanner.

If the issue persists, the following are a few other costmap parameters worth looking into:

Inf_is_valid = true this should be set for sensors that return inf for invalid measurements
Specify an observation source to be solely for clearing and solely for marking: e.g. clearing = true; marking = false
always_send_full_costmap = true

Clearing 3D obstacle layer (`VoxelLayer`) on costmap

Contrary to the obstacle layer discussed above which does 2D obstacle tracking, the voxel layer is a separate plugin which tracks obstacles in 3D. Data is used from sensors that publish messages of type PointCloud; in our case this is our front depth camera.

A similar issue of clearing costmaps was observed with the voxel layer. Specifically, the problem was observed to be hovering around the camera’s blind spot. Two solutions were implemented in order to mitigate this issue.

#1 Clearing costmap using recovery behavior:

Recovery behavior was setup such that it clears the obstacle_3d_layer whenever the planner fails to find a plan. Note recovery behavior only gets executed if a navigational goal is sent so clearing from this solution can only be possible while a goal is trying to be reached. Below are the parameters required for the fix and screenshots of the solution working as intended in simulation.

global_costmap_params_map.yaml/local_costmap_params.yaml

plugins:
     - {name: obstacle_3d_layer, type: "costmap_2d::VoxelLayer"}
     - {name: obstacle_2d_layer, type: "costmap_2d::ObstacleLayer"}

costmap_common_params.yaml

obstacle_3d_layer:

obstacle_2d_layer:

move_base_params.yaml

recovery_behavior_enabled: true
recovery_behaviors:
  - name: 'aggressive_reset'
    type: 'clear_costmap_recovery/ClearCostmapRecovery'

aggressive_reset:
  reset_distance: 0.0
  layer_names: ["obstacle_3d_layer"]

Setup for demoing recovery behavior to clear 3D obstacle layer.

Obstacle on costmap persists even though virtual obstacle is no longer in front of depth camera.

Global Planner fails to find a plan so recovery behavior is initiated and deletes the 3D obstacle layer from costmaps.

#2 Replacing default Voxel Layer plugin with Spatio-Temporal Voxel Layer:

To further alleviate this issue, specifically when the planner does indeed find a valid plan, the Spatio-Temporal Voxel Layer was implemented to replace the default Voxel Layer costmap plugin. This improved voxel grid package has a voxel_decay parameter which clears 3D obstacles on the costmap progressively with time eliminating the issue completely if voxel_decay is set to 0 seconds (though not entirely favorable when there is no rear depth camera).

Tally robot from Simbe Robotics using the Spatio-Temporal Voxel Layer to mark and clear obstacles (retrieved from https://user-images.githubusercontent.com/14944147/37010885-b18fe1f8-20bb-11e8-8c28-5b31e65f2844.gif).

Closing Thoughts and Next Steps

The ROS Navigation Stack is simple to implement regardless of the robot platform and can be highly effective if dedicated time is spent tuning parameters. Issues with the stack will depend on the type of mobile platform and the quality/type of range sensors used. We hope this blog has provided new insight into solving some of these issues.

In the future we aim to extend the SUMMIT’s navigational capabilities to using 3D LIDARs or an additional depth camera to give a full 3D view of the environment, web access to navigational control, and real-time updating of a centralized map server.

Schlagwörter: autonomous driving, navigation stack, robotics, ROS, Summit

Der Beitrag Configuring the ROS Navigation Stack on a new robot erschien zuerst auf Service Engineering (ICCLab & SPLab).

Service Engineering (ICCLab & SPLab)

Testing Alluxio for Memory Speed Computation on Ceph Objects

The framework used for testing

Some preliminary results

The Scala code snippets

Some Spark configuration information

Open issues in testing

Java 8 prerequisite

Default in-memory storage in Alluxio

Experimenting on Ceph Object Classes for Active Storage

What is active storage about?

So, what about Ceph?

Deploying an Object Class on a Ceph cluster

Cloning the Ceph git repository

Writing an object class

Writing the librados client applications

Comparing the client applications performance

VMs preparation for our test Ceph cluster deployment

Deploying the Ceph cluster

Fourth Robotics and ROS in Zurich Meetup

Our recent paper on Cloud Native Storage presented at EuCNC 2019

Third Robotics and ROS in Zürich Meetup

Summary of presentation #1: Perception and action planning in complex environments with ROS by Marko Rastislav and Martin Möller from F&P Personal Robotics

Summary of presentation #2: Self-calibrating camera position and grasping with Niryo arm by Dimitrios Dimopoulos from ICCLab, ZHAW

Closing Remarks

Second Robotics and ROS in Zürich Meetup

Summary of presentation #1: ROS for Digital Fabrication in Architecture by Dr. Romana Rust and Gonzalo Casas from ETH Gramazio Kohler Research group

Summary of demo: ROS integration into Magic Leap by Daniela Mitterberger from ETH Gramazio Kohler Research group

Summary of talk #2: Next Generation Security by Dr Max Werner, Samuel Garcin, Fernando Acero from WECORP

Closing Remarks

Niryo Arm Motor Troubleshooting

First Robotics and ROS in Zürich Meetup

Summary of presentation #1: vision for navigation in autonomous robotics by Julian Kent from Auterion

Summary of presentation #2: ROS applications at ICCLab by Giovanni Toffetti from ICCLab

Final thoughts

Running the ICCLab ROS Kinetic environment on your own laptop

The two projects you can test

robot navigation project

Pick&Place project

Configuring the ROS Navigation Stack on a new robot

Configuring a 2-LIDAR setup

Merging laser scans

Clearing 2D obstacle layer (ObstacleLayer) on costmap

Clearing 3D obstacle layer (VoxelLayer) on costmap

#1 Clearing costmap using recovery behavior:

#2 Replacing default Voxel Layer plugin with Spatio-Temporal Voxel Layer:

Closing Thoughts and Next Steps

Clearing 2D obstacle layer (`ObstacleLayer`) on costmap

Clearing 3D obstacle layer (`VoxelLayer`) on costmap