By Christoph Heitz (ZHAW)
translated from original German language version published at Inside IT
Can a prisoner be released early, or released on
bail? A judge who decides this should also consider the risk of
recidivism of the person to be released. Wouldn’t it be an
advantage to be able to assess this risk objectively and reliably?
This was the idea behind the COMPAS system developed by the US
system makes an individual prediction of the chance of recidivism for
imprisoned offenders, based on a wide range of personal data. The
result is a risk score between 1 and 10, where 10 corresponds to a
very high risk of recidivism. This system has been used for many
years in various U.S. states to support decision making of judges –
more than one million prisoners have already been evaluated using
COMPAS. The advantages are obvious: the system produces an objective
risk prediction that has been developed and validated on the basis of
thousands of cases.
May 2016, however, the journalists’ association ProPublica published
the results of research suggesting that this software systematically
discriminates against black people and overestimates their risk
(Angwin et al. 2016): 45 percent of black offenders who did not
reoffend after their release were identified as high-risk. In the
corresponding group of whites, however, only 23 percent were
attributed a high risk by the algorithm. This means that the
probability of being falsely assigned a high risk of recidivism is
twice as high for a black person as for a white person.
By Nico Ebert (ZHAW)
The original version of this post was published in German on Privacy Bits and English on vetri.global
In a lecture for the Fair Data Forum, I dealt with the question “What value does data protection have for individuals and what are they willing to pay for it?”
The three data privacy types
As always, there is not one “individual”, as everyone has different data protection preferences and thus, attributes different value to having personal data safeguarded. Therefore, in order to classify individuals, there are different “typologies”. For example, Westin distinguishes between data protection fundamentalists, data protection pragmatists and completely unconcerned individuals. In 2002, Sheehan (2002) selected 889 persons in the USA and classified them with a questionnaire. Conclusion: 16% of the respondents were completely unconcerned about data protection, 81% were classified as pragmatists, and 3% as fundamentalists.
Reviewed by Thoralf Mildenberger (ZHAW)
- Paul. D. Ellis, The Essential Guide to Effect Sizes. Statistical Power, Meta-Analysis and the Interpretation of Research Results. Cambridge University Press, Cambridge 2010. Link to book on publisher’s website.
In the last few years, statistical hypothesis testing – with the p-value still being THE standard for reporting results in many fields of science – has increasingly been criticized. Many researchers have even called for abandoning the “NHST” (Null Hypothesis Significance Testing) approach all together. I think this is going too far as many problems are due to misapplication of the techniques and – perhaps even more importantly – misinterpretation of the results. There is also no consensus on how to replace hypothesis testing with a better methodology – some of the more moderate critics suggest using confidence intervals, but while these are often more informative they are essentially equivalent to hypothesis tests and share some of the problems. This makes it all the more important to highlight difficulties in the correct application and interpretation of statistical methodology. Continue reading
By Lukas Tuggener (ZHAW)
Reposted from https://medium.com/@ltuggener/who-is-scared-of-the-big-bad-robot-dc203e2cd7c4
There currently is much talk about the recent developments of AI and how it is going to affect the way we live our lives. While fears of super intelligent robots, which want to end all human live on earth, are mostly held by laymen. There are other concerns, however, which are also very common amongst insiders of the field. Most AI experts agree that the short to midterm impact of AI developments will mostly revolve around automating complex tasks, rather than “artificially intelligent beings”. The well-known AI researcher Andrew Ng put it this way: Continue reading
By Thilo Stadelmann (ZHAW)
Reposted from https://dublin.zhaw.ch/~stdm/?p=350#more-350
I recently came about the notion of “type A” and “type B” data scientists. While the “type A” is basically a trained statistician that has broadened his field towards modern use cases (“data science for people”), the same is true for “type B” (B for “build”, “data science for software”) that has his roots in programming and contributes stronger to code and systems in the backend.
Frankly, I haven’t come about a practically more useless distinction since the inception of the term “data science”. Data science is the name for a new discipline that is in itself interdisciplinary [see e.g. here – but beware of German text]. The whole point of interdisciplinarity, and by extension of data science, is for proponent to think outside the box of his or her original discipline (which might be be statistics, computer science, physics, economics or something completely different), and acquire skills in the neighboring disciplines in order to tackle problems outside of intellectual silos. Encouraging practitioners to stay in their silos, as this A/B typology suggests, is counterproductive at best, fatal at worst. Continue reading
Drew Conway´s data science Venn diagram is used by many (including me) to give a first impression of what data science is all about. And rightly so: I, for example, like it for its simplicity and “coolness”.
When in a more in-depth discussion, moving from mere buzz to concrete skills and project possibilities, we at the Datalab have gained good experiences with the following “skill set map”: