Day one provides a general introduction to combining multiple administrative and survey datasets for statistical purposes. A total-error framework is presented for integrated statistical data, which provides a systematic overview of the origin and nature of the various potential errors. The most typical data configurations are illustrated and the relevant statistical methods reviewed.
Day two covers a handful of selected statistical methods. Training will be given on the techniques of data fusion, or statistical matching, by which joint statistical data is created from separate marginal observations. The participants will be introduced to several imputation or adjustment techniques, in the presence of constraints arising from overlapping data sources.
This course is ideal for social and medical researchers with interests in combining data from multiple sources or analysing data from different sources; staff at National Statistical Institutes (or similar organisations) who are involved in the design, management and quality assurance of statistical processes based on data from multiple sources including censuses, administrative data and sample surveys.
Understanding of the following are required: central concepts of statistical uncertainty (such as bias, variance, confidence interval) and distribution, basic knowledge of data cleaning and imputation, basic experience/skill of R for statistical computing. Methodological training, knowledge and experience will be helpful.
Further course details can be found here.
Podcast for some of our previous courses can be found here
This short course will provide a detailed overview of the topic, covering all important aspects relevant for the synthetic data approach. Starting with a short introduction to data confidentiality in general and synthetic data in particular, the workshop will discuss the different approaches to generating synthetic datasets in detail. Possible modelling strategies and analytical validity evaluations will be assessed and potential measures to quantify the remaining risk of disclosure will be presented.
The aim is to provide the participants with hands on experience, the course will include practical sessions using R, in which the students generate and evaluate synthetic data based on real data examples.
The course intends to summarize the state of the art in synthetic data. The main focus will be on practical implementation and not so much on the motivation of the underlying statistical theory. Participants may be academic researchers or practitioners from statistical agencies working in the area of data confidentiality and data access. Basic knowledge in R is expected. Some background in Bayesian statistics is helpful but not obligatory.
Further information can be found here
This short course is designed to give participants a practical introduction to data linkage and is aimed at researchers either intending to use data linkage themselves or to analyse linked data. Examples of the uses of data linkage, data preparation, methods for linkage (including deterministic and probabilistic approaches) and issues for the analysis of linked data are covered. The main focus of this course will be health data, although the concepts will apply to many other areas. This course includes a practical example involving data to be linked, to enable participants to put theory into practice involving data to be linked, to enable participants to put theory into practice.
The course is aimed at researchers who need to gain an understanding of data linkage techniques. The course provides an introduction to data linkage theory and methods for those who might be using linked data in their own work. Participants may be academic researchers in the social and health sciences or may work in government, survey agencies, and official statistics, for charities or the private sector.
Training podcast available here
Further course details can be found here
Database systems are increasingly being used for working with medical data and enable the rapid querying of complex data in health and social care. This short course will introduce the theory behind the relational data model and enable participants to gain an understanding on how data can be modelled and stored in a relational database system and what different data types are used.
Course suitable for Epidemiologists, medical statisticians and other researchers working with electronic health records data.
Some experience of working in a statistical package. Participants will have to sign confidentiality and data use agreements at the start of the course.