By Nico Ebert (ZHAW)
translated from the original German language version published at Inside IT
A common narrative in practice sounds something like this: “people claim data protection is important to them, but in reality they give away everything on the internet anyway”. There are also some science studies that seem to prove this again and again: that we are generally careless with our and other personal data and that we consider data protection important but neglect it in everyday life. For example, a “pizza experiment” with 3,000 students at a US university in 2017 concluded that a free pizza was enough of an incentive to reveal the email addresses of three fellow students (Athey et al. 2017).
By Bettina Mack (ZHAW)
ANNPR, the “International Workshop on Artificial Neural Networks in Pattern Recognition” is a biennial academic conference where researchers come together to discuss the most recent advances in the fields of neural networks, deep learning and artificial intelligence as applied to pattern recognition. Pattern recognition is the field of computer science which is concerned with making sense of data such as images (“What do we see in the picture?”), audio data (for example, to recognize spoken words) or time-dependent inputs such as weather or stock-market data. This year’s edition was organized by Frank-Peter Schilling and Thilo Stadelmann from ZHAW’s Institute of Applied Informatics (InIT) and took place from 2-4 September.
By Nico Ebert (ZHAW)
cross-posted from WINsights blog
Each of us is confronted with countless privacy notices every day and agrees to the practices described. Most likely we do not even notice this because the privacy information is hidden in long and cumbersome privacy policies. In order to inform users more specifically with more relevant information about privacy, it is first necessary to understand which information is relevant to users at all. Marketing traditionally asks users about their needs, so why not ask users about their needs for privacy information?
Researchers have recently suggested that a specific usage context should be considered to make privacy notices more relevant to users. Therefore, we asked users regarding their needs in very specific contexts. We conducted an explorative online survey of privacy concerns and privacy information preferences with 642 participants in Switzerland for two different contexts. The contexts are loyalty cards (e.g. Cumulus, Supercard or Ikea) and fitness tracking (e.g. Fitbit, Garmin, Apple Health).
By Christoph Heitz (ZHAW)
translated from original German language version published at Inside IT
Can a prisoner be released early, or released on
bail? A judge who decides this should also consider the risk of
recidivism of the person to be released. Wouldn’t it be an
advantage to be able to assess this risk objectively and reliably?
This was the idea behind the COMPAS system developed by the US
system makes an individual prediction of the chance of recidivism for
imprisoned offenders, based on a wide range of personal data. The
result is a risk score between 1 and 10, where 10 corresponds to a
very high risk of recidivism. This system has been used for many
years in various U.S. states to support decision making of judges –
more than one million prisoners have already been evaluated using
COMPAS. The advantages are obvious: the system produces an objective
risk prediction that has been developed and validated on the basis of
thousands of cases.
May 2016, however, the journalists’ association ProPublica published
the results of research suggesting that this software systematically
discriminates against black people and overestimates their risk
(Angwin et al. 2016): 45 percent of black offenders who did not
reoffend after their release were identified as high-risk. In the
corresponding group of whites, however, only 23 percent were
attributed a high risk by the algorithm. This means that the
probability of being falsely assigned a high risk of recidivism is
twice as high for a black person as for a white person.
by Fernando Benites (ZHAW and SpinningBytes)
cross-posted from github
We explain here, step by step, how to reproduce results of the approach and discuss parts of the paper. The approach was aimed at building a strong baseline for the task, which should be beaten by deep learning approaches, but we did not achieve that, so we submitted this baseline, and got second in the flat problem and 1st in the hierarchical task (subtask B). This baseline builds on strong placements in different shared tasks, and although it only is a clever way for keyword spotting, it performs a very good job. Code and data can be accessed in the repository GermEval_2019
by Fernando Benites (ZHAW and SpinningBytes)
cross-posted from the SpinningBytes blog
This year, the SpinningBytes team participated in the VarDial competition, where we achieved second place in the German Dialect Identification shared task. The task’s goal was to identify, which region the speaker of a given sentence is from, based on the dialect he or she speaks. Dialect identification is an important NLP task; for instance, it can be used for automatic processing in a speech-to-text context, where identifying dialects enables to load a specialized model. In this blog post, we do a step by step walkthrough how to create the model in Python, while comparing it to previous years’ approaches.
By Kurt Stockinger (ZHAW)
As part of “Zürich meets San Francisco – A Festival Of Two Cities”, the ZHAW Datalab co-organized the event Data Science and Beyond: Technical, Economic and Societal Challenges, which took place at the campus of San José State University (SJSU) – in the heart of Silicon Valley. One interesting fact about SJSU is that it has the highest number of graduates among all US universities that get jobs either at Apple or Cisco.
Reviewed by Thoralf Mildenberger (ZHAW)
- Paul. D. Ellis, The Essential Guide to Effect Sizes. Statistical Power, Meta-Analysis and the Interpretation of Research Results. Cambridge University Press, Cambridge 2010. Link to book on publisher’s website.
In the last few years, statistical hypothesis testing – with the p-value still being THE standard for reporting results in many fields of science – has increasingly been criticized. Many researchers have even called for abandoning the “NHST” (Null Hypothesis Significance Testing) approach all together. I think this is going too far as many problems are due to misapplication of the techniques and – perhaps even more importantly – misinterpretation of the results. There is also no consensus on how to replace hypothesis testing with a better methodology – some of the more moderate critics suggest using confidence intervals, but while these are often more informative they are essentially equivalent to hypothesis tests and share some of the problems. This makes it all the more important to highlight difficulties in the correct application and interpretation of statistical methodology. Continue reading