Events

The Potential of Synthetic Data

The 3rd meeting of the expert group “Privacy Technologies for Data Collaboration” took place online on September 8, 2021 in the afternoon. It was joined by 14 participants. Following a summary of the event:

Matthias Templ, an expert from ZHAW in the areas of data anonymization and synthetic data, presented the concept of synthetic data. Synthetic data is „any production data applicable to a given situation that are not obtained by direct measurement“ according to the McGraw-Hill Dictionary of Scientific and Technical Terms. Synthetic data is generated from datasets that often contain personal data and should not be shared with third parties. However, major properties of the synthetic dataset are equal compared to the original dataset and it therefore can used for similar purposes such as learning about distributions.

Matthias explained that creating synthetic data first requires a good understanding of the original dataset (e.g., personal data about a population). This includes understanding its generation process and its inherent distributions (including marginal distributions). Afterwards these distributions are rebuilt with one or more models (e.g., neural networks, decision trees). The models are then used to generate the synthetic dataset. Matthias has developed and published an r library to accomplish this task. He also demonstrated some of his real-world examples in which synthetic data had been applied. After Matthias’ presentation the participants discussed about the potentials of synthetic data. Another discussion point was which modelling techniques are required for which complexities of the original datasets (e.g., datasets with only a few features require less complex techniques.

In the second half of the meeting the participants discussed the potential benefits of the “Data Collaboration Canvas”. The Data Collaboration Canvas is a graphical workshop tool and has been developed with the help of the expert group. It is aimed at organizations that want to explore the potential of data innovation with other organizations at an early stage to create mutual added value. It offers a simple, visual structuring aid, e.g., in workshops, to identify common potentials and hurdles of collaboration. The canvas can not only be used to identify data collaboration opportunities between organizations such as companies but also within an organization (e.g., opportunities between different divisions or departments). Participants applied the canvas in two different use cases and discussed usability and comprehensibility of the canvas afterwards.

The next event will take place on November 23, 2021.

Discussion

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.