PDPC publishes proposed guide on synthetic data generation
30 July 2024
On 15 July 2024, the Personal Data Protection Commission (“PDPC”) launched the Proposed Guide on Synthetic Data Generation (“Guide”) to help organisations make sense of synthetic data (“SD”) by explaining what SD is, how it can be used, and best practices in creating SD. Jointly developed with the Agency for Science, Technology and Research and supported by the Info-communications Media Development Authority (“IMDA”), the Guide will be offered as a resource within IMDA’s Privacy Enhancing Technology (PET) Sandbox, which includes a checklist of good practices to adopt when generating SD in order to guard against any possible risk of re-identification.
SD is commonly referred to as artificial data that has been generated using a purpose-built mathematical model (including artificial intelligence (“AI”)/machine learning (ML) models) or algorithm.
Generating SD is a form of Privacy Enhancing Technology (“PET”) that is gaining traction. It creates realistic data for AI model training without using the sensitive data. PETs are a suite of tools and techniques that allow the processing, analysis, and extraction of insights from data without revealing underlying personal or commercially sensitive data.
SD can be used in a variety of use cases ranging from generating training datasets for AI models to data analysis and collaboration. The use of SD not only can accelerate research, innovation, collaboration, and decision-making but also mitigate concerns about cybersecurity incidents and data breaches, enabling better compliance with data protection/privacy regulations.
The Guide focuses on the use of SD to generate structured data. While SD is generally fictitious data that may not be considered personal data on its own, it is not inherently risk-free due to possible re-identification risks. As such, the Guide proposes good practices that organisations may adopt to generate SD to minimise such risks for a set of common use case archetypes. The Guide also includes a set of good practices and risk assessments/considerations for generating SD as well as governance controls, contractual process, and technical measures to mitigate residual risks.
Reference materials
The following materials are available on the PDPC website www.pdpc.gov.sg and the IMDA website www.imda.gov.sg: