Hi, you are logged in as , if you are not , please click here

Generating Synthetic Data for Statistical Disclosure Control

More Info

Course Information

Generating Synthetic Data for Statistical Disclosure Control

Course Summary

This short course will provide a detailed overview of the topic, covering all important aspects relevant for the synthetic data approach. Starting with a short introduction to data confidentiality in general and synthetic data in particular, the workshop will discuss the different approaches to generating synthetic datasets in detail. Possible modelling strategies and analytical validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure will be presented.

The aim is to provide the participants with hands on experience, the course will include practical sessions using R, in which the students generate and evaluate synthetic data based on real data examples.

Target Audience

The course intends to summarize the state of the art in synthetic data. The main focus will be on practical implementation and not so much on the motivation of the underlying statistical theory. Participants may be academic researchers or practitioners from statistical agencies working in the area of data confidentiality and data access. Basic knowledge in R is expected. Some background in Bayesian statistics is helpful but not obligatory.

Further information can be found here

Course Code

ADRCE-ADRCE-training037 Drechsler

Course Dates

2nd May 2017 – 3rd May 2017

Places Available


Course Leader

Dr Jörg Drechsler
Course Description

Course Outline:

The course covers:

  • the fully synthetic data approach
  • the partially synthetic data approach
  • modelling strategies for generating synthetic data
  • data utility evaluations
  • disclosure risk assessment

Learning Outcomes:

By the end of the course participants will:

  • have a practical understanding of the concept of synthetic data 
  • be able to judge in which situations the approach could be useful
  • know how to generate synthetic data from their own data
  • have a number of tools available to evaluate the analytical validity of the synthetic datasets

know how to assess the disclosure risk of the generated data

How would you rate your experience today?

How can we contact you?

What could we do better?

   Change Code