Evaluating Privacy Metrics for Synthetic Tabular Data

Mushi , Wang

Evaluating Privacy Metrics for Synthetic Tabular Data

Files

Wang_Mushi.pdf (551.73 KB)

Date

2024-08-22

Authors

Mushi , Wang

Advisor

He, Xi

Publisher

University of Waterloo

Abstract

This paper addresses the challenge of evaluating privacy risks in synthetic tabular data by examining black-box privacy metrics that do not require detailed knowledge of the data generation process. We focus on two sorts of attacks, black-box and white-box attacks. Utilizing six datasets from the UCI Machine Learning Repository, we evaluate the effectiveness of these metrics across various synthetic data generation models, including diffusion models like TabDDPM and traditional models like PrivBayes. Our findings reveal that while DOMIAS exhibits limited sensitivity across different datasets and configurations, DCR proves to be an effective measure of similarity between synthetic and real data, offering significant insights into privacy preservation. We also introduce the Step-wise Error Comparing Membership Inference (SECMI) attack, which assesses prediction errors at each generation step to infer membership status. The study concludes that diffusion models, such as TabDDPM, generally achieve a superior balance of utility and privacy compared to traditional models. These results highlight the need for robust, adaptable privacy metrics to reliably assess privacy risks in synthetic data, thereby ensuring its safe application across various domains.

URI

https://hdl.handle.net/10012/20851

Collections

Theses
Computer Science

Full item page

Evaluating Privacy Metrics for Synthetic Tabular Data

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections