Hi, you are logged in as , if you are not , please click here

Generating Synthetic Data for Statistical Disclosure Control

More Info

Course Information

Generating Synthetic Data for Statistical Disclosure Control

Course Summary

This short course will provide a detailed overview of the topic, covering all important aspects relevant for the synthetic data approach. Starting with a short introduction to data confidentiality in general and synthetic data in particular, the workshop will discuss the different approaches to generating synthetic datasets in detail. Possible modelling strategies and analytical validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure will be presented.

The aim is to provide the participants with hands on experience, the course will include practical sessions using R, in which the students generate and evaluate synthetic data based on real data examples.

Target Audience

The course intends to summarize the state of the art in synthetic data. The main focus will be on practical implementation and not so much on the motivation of the underlying statistical theory. Participants may be academic researchers or practitioners from statistical agencies working in the area of data confidentiality and data access. Basic knowledge in R is expected. Some background in Bayesian statistics is helpful but not obligatory.

Further information can be found here


More information regarding our courses can be found here.


Podcast for some of our previous courses can be found here.

Course Code

ADRCE-ADRCE-training037 Drechsler

Course Dates

16th October 2017 – 17th October 2017

Places Available

Course Leader

Dr Jorg Drechsler
Course Description

Course Outline:

The course covers:

  • the fully synthetic data approach
  • the partially synthetic data approach
  • modelling strategies for generating synthetic data
  • data utility evaluationsdisclosure risk assessment

Learning Outcomes:

By the end of the course participants will:

  • have a practical understanding of the concept of synthetic data 
  • be able to judge in which situations the approach could be useful
  • know how to generate synthetic data from their own data
  • have a number of tools available to evaluate the analytical validity of the synthetic datasets
  • know how to assess the disclosure risk of the generated data


Delegates will need to bring their own laptops with the latest version of R installed. It would be helpful if you  installed the most recent version of the synthpop package in R prior to the course.  This is the link https://CRAN.R-project.org/package=synthpop.  Or you could instead open an R session and type install.packages(“synthpop”).

How would you rate your experience today?

How can we contact you?

What could we do better?

   Change Code