IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity

Authors

  • Wenjie Wang ShanghaiTech University
  • Pengfei Tang Emory University
  • Jian Lou ZJU-Hangzhou Global Scientific and Technological Innovation Center
  • Yuanming Shao ShanghaiTech University
  • Lance Waller Emory University
  • Yi-an Ko Emory Unviversity
  • Li Xiong Emory University

DOI:

https://doi.org/10.1609/aaai.v38i14.29491

Keywords:

ML: Deep Generative Models & Autoencoders, ML: Privacy, ML: Time-Series/Data Streams

Abstract

Integrating electronic health records (EHR) into machine learning-driven clinical research and hospital applications is important, as it harnesses extensive and high-quality patient data to enhance outcome predictions and treatment personalization. Nonetheless, due to privacy and security concerns, the secondary purpose of EHR data is consistently governed and regulated, primarily for research intentions, thereby constraining researchers' access to EHR data. Generating synthetic EHR data with deep learning methods is a viable and promising approach to mitigate privacy concerns, offering not only a supplementary resource for downstream applications but also sidestepping the confidentiality risks associated with real patient data. While prior efforts have concentrated on EHR data synthesis, significant challenges persist in the domain of generating synthetic EHR data: balancing the heterogeneity of real EHR including temporal and non-temporal features, addressing the missing values and irregular measures, and ensuring the privacy of the real data used for model training. Existing works in this domain only focused on solving one or two aforementioned challenges. In this work, we propose IGAMT, an innovative framework to generate privacy-preserved synthetic EHR data that not only maintain high quality with heterogeneous features, missing values, and irregular measures but also balances the privacy-utility trade-off. Extensive experiments prove that IGAMT significantly outperforms baseline architectures in terms of visual resemblance and comparable performance in downstream applications. Ablation case studies also prove the effectiveness of the techniques applied in IGAMT.

Published

2024-03-24

How to Cite

Wang, W., Tang, P., Lou, J., Shao, Y., Waller, L., Ko, Y.- an, & Xiong, L. (2024). IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity. Proceedings of the AAAI Conference on Artificial Intelligence, 38(14), 15634-15643. https://doi.org/10.1609/aaai.v38i14.29491

Issue

Section

AAAI Technical Track on Machine Learning V