Evaluating Privacy Metrics for Synthetic Tabular Data
Loading...
Date
2024-08-22
Authors
Advisor
He, Xi
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
This paper addresses the challenge of evaluating privacy risks in synthetic tabular data by examining black-box privacy metrics that do not require detailed knowledge of the data generation process. We focus on two sorts of attacks, black-box and white-box attacks. Utilizing six datasets from the UCI Machine Learning Repository, we evaluate the effectiveness of these metrics across various synthetic data generation models, including diffusion models like TabDDPM and traditional models like PrivBayes. Our findings reveal that while DOMIAS exhibits limited sensitivity across different datasets and configurations, DCR proves to be an effective measure of similarity between synthetic and real data, offering significant insights into privacy preservation. We also introduce the Step-wise Error Comparing Membership Inference (SECMI) attack, which assesses prediction errors at each generation step to infer membership status. The study concludes that diffusion models, such as TabDDPM, generally achieve a superior balance of utility and privacy compared to traditional models. These results highlight the need for robust, adaptable privacy metrics to reliably assess privacy risks in synthetic data, thereby ensuring its safe application across various domains.