By Christoph Heitz (ZHAW)
translated from original German language version published at Inside IT
Can a prisoner be released early, or released on
bail? A judge who decides this should also consider the risk of
recidivism of the person to be released. Wouldn’t it be an
advantage to be able to assess this risk objectively and reliably?
This was the idea behind the COMPAS system developed by the US
system makes an individual prediction of the chance of recidivism for
imprisoned offenders, based on a wide range of personal data. The
result is a risk score between 1 and 10, where 10 corresponds to a
very high risk of recidivism. This system has been used for many
years in various U.S. states to support decision making of judges –
more than one million prisoners have already been evaluated using
COMPAS. The advantages are obvious: the system produces an objective
risk prediction that has been developed and validated on the basis of
thousands of cases.
May 2016, however, the journalists’ association ProPublica published
the results of research suggesting that this software systematically
discriminates against black people and overestimates their risk
(Angwin et al. 2016): 45 percent of black offenders who did not
reoffend after their release were identified as high-risk. In the
corresponding group of whites, however, only 23 percent were
attributed a high risk by the algorithm. This means that the
probability of being falsely assigned a high risk of recidivism is
twice as high for a black person as for a white person.
by Fernando Benites (ZHAW and SpinningBytes)
cross-posted from github
We explain here, step by step, how to reproduce results of the approach and discuss parts of the paper. The approach was aimed at building a strong baseline for the task, which should be beaten by deep learning approaches, but we did not achieve that, so we submitted this baseline, and got second in the flat problem and 1st in the hierarchical task (subtask B). This baseline builds on strong placements in different shared tasks, and although it only is a clever way for keyword spotting, it performs a very good job. Code and data can be accessed in the repository GermEval_2019
by Fernando Benites (ZHAW and SpinningBytes)
cross-posted from the SpinningBytes blog
This year, the SpinningBytes team participated in the VarDial competition, where we achieved second place in the German Dialect Identification shared task. The task’s goal was to identify, which region the speaker of a given sentence is from, based on the dialect he or she speaks. Dialect identification is an important NLP task; for instance, it can be used for automatic processing in a speech-to-text context, where identifying dialects enables to load a specialized model. In this blog post, we do a step by step walkthrough how to create the model in Python, while comparing it to previous years’ approaches.
By Kurt Stockinger (ZHAW)
As part of “Zürich meets San Francisco – A Festival Of Two Cities”, the ZHAW Datalab co-organized the event Data Science and Beyond: Technical, Economic and Societal Challenges, which took place at the campus of San José State University (SJSU) – in the heart of Silicon Valley. One interesting fact about SJSU is that it has the highest number of graduates among all US universities that get jobs either at Apple or Cisco.
Reviewed by Thoralf Mildenberger (ZHAW)
- Paul. D. Ellis, The Essential Guide to Effect Sizes. Statistical Power, Meta-Analysis and the Interpretation of Research Results. Cambridge University Press, Cambridge 2010. Link to book on publisher’s website.
In the last few years, statistical hypothesis testing – with the p-value still being THE standard for reporting results in many fields of science – has increasingly been criticized. Many researchers have even called for abandoning the “NHST” (Null Hypothesis Significance Testing) approach all together. I think this is going too far as many problems are due to misapplication of the techniques and – perhaps even more importantly – misinterpretation of the results. There is also no consensus on how to replace hypothesis testing with a better methodology – some of the more moderate critics suggest using confidence intervals, but while these are often more informative they are essentially equivalent to hypothesis tests and share some of the problems. This makes it all the more important to highlight difficulties in the correct application and interpretation of statistical methodology. Continue reading
By Kurt Stockinger (ZHAW)
The final results of an interdisciplinary study funded by „TA Swiss“ on „Quantified Self“ with participation of the Datalab have been published. The study was performed by three ZHAW departments (School of Health Professions, School of Management and Law, School of Engineering) in cooperation with the Institute for Futures Studies and Technology Assessment, Berlin. The focus of the Datalab was on legal and Big Data aspects of quantified self.
The results are available in various forms:
- A book (for people who love reading)
- A 24-page summary in four languages (for people who don’t want to read some 250 pages)
- A podcast from SRF 1 (Echo der Zeit)
- A NZZ article
Enjoy reading and maybe you get encouraged to “quantify yourself” a bit better 😉
By Dirk Wilhelm (ZHAW)
Reposted from https://blog.zhaw.ch/industrie4null/2018/01/10/phd-network-in-data-science/
Studierende können nun an der ZHAW in Kooperation mit der Universität Zürich oder der Universität Neuenburg im Bereich Data Science doktorieren. Continue reading
2nd European COST Conference on Mathematics for Industry in Switzerland
September 7, 2017
Zurich University of Applied Sciences,
Technikumstr. 71, 8400 Winterthur
By Jörg Osterrieder (ZHAW)
Below please find a short recap and an outlook for our next conference on September 6, 2018.
Aim of the conference
The aim of this conference was to bring together European academics, young researchers, students and industrial practitioners to discuss the application of Artificial Intelligence to various practical fields. In a broader context, we wanted to promote «Mathematics for Industry» in Switzerland, as part of the European COST (Cooperation in Science and Technology) Action “Mathematics for Industry”, where members of ZHAW are in the management committee for Switzerland. Continue reading
By Matthias Templ (ZHAW), Thoralf Mildenberger (ZHAW)
By way of example, functionality of
Reduce() is shown in in https://blog.zhaw.ch/datascience/r-reduce-applys-lesser-known-brother/ . It’s great to learn about how to use this function on interesting problems. If you are ready (equals if you read the first blog post on Reduce), we want to push you further on writing efficient code. Continue reading
By Thoralf Mildenberger (ZHAW)
Everybody who knows a bit about
R knows that in general loops are said to be evil and should be avoided, both for efficiency reasons and code readability, although one could argue about both.
The usual advice is to use vector operations and
apply() and its relatives.
lapply() work by applying a function on each element of a vector or list and return a vector, matrix, array or list of the results.
apply() applies a function on one of the dimensions of a matrix or array and returns a vector, matrix or array. These are very useful, but they only work if the function to be applied to the data can be applied to each element independently of each other.
There are cases, however, where we would still use a
for loop because the result of applying our operation to an element of the list depends on the results for the previous elements. The
R base package provides a function
Reduce(), which can come in handy here. Of course it is inspired by functional programming, and actually does something similar to the Reduce step in
MapReduce, although it is not inteded for big data applications. Since it seems to be little known even to long-time
R users, we will look at a few examples in this post. Continue reading