A common narrative in practice sounds something like this: “people claim data protection is important to them, but in reality they give away everything on the internet anyway”. There are also some science studies that seem to prove this again and again: that we are generally careless with our and other personal data and that we consider data protection important but neglect it in everyday life. For example, a “pizza experiment” with 3,000 students at a US university in 2017 concluded that a free pizza was enough of an incentive to reveal the email addresses of three fellow students (Athey et al. 2017).
Many Internet users inside and outside the European Union are very familiar with cookie banners: they pop up on websites, they are often annoying, and it is tedious to really deal with them. Having to state our data sharing and protection preferences over and over again is a questionable concept by itself. But even if we accept the concept of cookie banner as a matter of fact our behavior towards them seems paradox at a first glance.
ANNPR, the “International Workshop on Artificial Neural Networks in Pattern Recognition” is a biennial academic conference where researchers come together to discuss the most recent advances in the fields of neural networks, deep learning and artificial intelligence as applied to pattern recognition. Pattern recognition is the field of computer science which is concerned with making sense of data such as images (“What do we see in the picture?”), audio data (for example, to recognize spoken words) or time-dependent inputs such as weather or stock-market data. This year’s edition was organized by Frank-Peter Schilling and Thilo Stadelmann from ZHAW’s Institute of Applied Informatics (InIT) and took place from 2-4 September.
We concluded an compelling interdisciplinary project on the topic of digitalization, where we applied a selection of fundamental methods of data science: web scraping, data wrangling with elastic search/kibana juggling, data cleaning, counting, posing questions and searching for answers in the data. We would like to share some results on this blog.
The project was called “DIGITAL COMMUNICATION STRATEGIES FOR THE CULTURAL SECTOR IN THE BODENSEE REGION”, in which the data analysis module dealt with the question of how digitalization was actually implemented in the region of the Lake of Constance. This was done using the example of some cultural providers such as museums, galleries, exhibitions and theatres on the region. We use in the terms Lake of Constance region and Bodensee region interchangeably this article, since Bodensee is Lake of Constance in German.
Each of us is confronted with countless privacy notices every day and agrees to the practices described. Most likely we do not even notice this because the privacy information is hidden in long and cumbersome privacy policies. In order to inform users more specifically with more relevant information about privacy, it is first necessary to understand which information is relevant to users at all. Marketing traditionally asks users about their needs, so why not ask users about their needs for privacy information?
Researchers have recently suggested that a specific usage context should be considered to make privacy notices more relevant to users. Therefore, we asked users regarding their needs in very specific contexts. We conducted an explorative online survey of privacy concerns and privacy information preferences with 642 participants in Switzerland for two different contexts. The contexts are loyalty cards (e.g. Cumulus, Supercard or Ikea) and fitness tracking (e.g. Fitbit, Garmin, Apple Health).
We explain here, step by step, how to reproduce results of the approach and discuss parts of the paper. The approach was aimed at building a strong baseline for the task, which should be beaten by deep learning approaches, but we did not achieve that, so we submitted this baseline, and got second in the flat problem and 1st in the hierarchical task (subtask B). This baseline builds on strong placements in different shared tasks, and although it only is a clever way for keyword spotting, it performs a very good job. Code and data can be accessed in the repository GermEval_2019
This year, the SpinningBytes team participated in the VarDial competition, where we achieved second place in the German Dialect Identification shared task. The task’s goal was to identify, which region the speaker of a given sentence is from, based on the dialect he or she speaks. Dialect identification is an important NLP task; for instance, it can be used for automatic processing in a speech-to-text context, where identifying dialects enables to load a specialized model. In this blog post, we do a step by step walkthrough how to create the model in Python, while comparing it to previous years’ approaches.
Kurt Stockinger was invited to contribute a blog to ACM SIGMOD – the leading world-wide community of database research. The blog discusses recent technological advances of natural language interfaces to databases. The ultimate goal is to talk to a database (almost) like to a human.
The full blog can be found on the following ACM SIGMOD link: