Tag: Programming

Twistbytes Approach to Hierarchical Classification shared Task at GermEval 2019

by Fernando Benites (ZHAW and SpinningBytes)

cross-posted from github

We explain here, step by step, how to reproduce results of the approach and discuss parts of the paper. The approach was aimed at building a strong baseline for the task, which should be beaten by deep learning approaches, but we did not achieve that, so we submitted this baseline, and got second in the flat problem and 1st in the hierarchical task (subtask B). This baseline builds on strong placements in different shared tasks, and although it only is a clever way for keyword spotting, it performs a very good job. Code and data can be accessed in the repository GermEval_2019

Continue reading

Twist Bytes @Vardial 2018

by Fernando Benites (ZHAW and SpinningBytes)

cross-posted from the SpinningBytes blog

schwiiz ja*

This year, the SpinningBytes team participated in the VarDial competition, where we achieved second place in the German Dialect Identification shared task. The task’s goal was to identify, which region the speaker of a given sentence is from, based on the dialect he or she speaks. Dialect identification is an important NLP task; for instance, it can be used for automatic processing in a speech-to-text context, where identifying dialects enables to load a specialized model. In this blog post, we do a step by step walkthrough how to create the model in Python, while comparing it to previous years’ approaches.

Continue reading

R: Reduce() Part 2 – some pitfalls using Reduce

By Matthias Templ (ZHAW), Thoralf Mildenberger (ZHAW)

By way of example, functionality of Reduce() is shown in in https://blog.zhaw.ch/datascience/r-reduce-applys-lesser-known-brother/ . It’s great to learn about how to use this function on interesting problems. If you are ready (equals if you read the first blog post on Reduce), we want to push you further on writing efficient code. Continue reading

R: Reduce() – apply’s lesser known brother

By Thoralf Mildenberger (ZHAW)

Everybody who knows a bit about R knows that in general loops are said to be evil and should be avoided, both for efficiency reasons and code readability, although one could argue about both.

The usual advice is to use vector operations and apply() and its relatives. sapply(), vapply() and lapply() work by applying a function on each element of a vector or list and return a vector, matrix, array or list of the results. apply() applies a function on one of the dimensions of a matrix or array and returns a vector, matrix or array. These are very useful, but they only work if the function to be applied to the data can be applied to each element independently of each other.

There are cases, however, where we would still use a for loop because the result of applying our operation to an element of the list depends on the results for the previous elements. The R base package provides a function Reduce(), which can come in handy here. Of course it is inspired by functional programming, and actually does something similar to the Reduce step in MapReduce, although it is not inteded for big data applications. Since it seems to be little known even to long-time R users, we will look at a few examples in this post. Continue reading