Privacy preserving synthetic health data

Privacy preserving synthetic health data. Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., & Bennett, K. P. In ESANN 2019 - Proceedings, 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pages 465–470, Bruges, Belgium, April, 2019.

Paper abstract bibtex

We examine the feasibility of using synthetic medical data generated by GANs in the classroom, to teach data science in health informatics. We present an end-to-end methodology to retain instructional utility, while preserving privacy to a level, which meets regulatory requirements: (1) a GAN is trained by a certified medical-data security-aware agent, inside a secure environment; (2) the final GAN model is used outside of the secure environment by external users (instructors or researchers) to generate synthetic data. This second step facilitates data handling for external users, by avoiding de-identification, which may require special user training, be costly, and/or cause loss of data fidelity. We benchmark our proposed GAN versus various baseline methods using a novel set of metrics. At equal levels of privacy and utility, GANs provide small footprint models, meeting the desired specifications of our application domain. Data, code, and a challenge that we organized for educational purposes are available.

@inproceedings{yale_privacy_2019,
	address = {Bruges, Belgium},
	title = {Privacy preserving synthetic health data},
	isbn = {978-2-87587-065-0},
	url = {https://hal.inria.fr/hal-02160496},
	abstract = {We examine the feasibility of using synthetic medical data generated by GANs in the classroom, to teach data science in health informatics. We present an end-to-end methodology to retain instructional utility, while preserving privacy to a level, which meets regulatory requirements: (1) a GAN is trained by a certified medical-data security-aware agent, inside a secure environment; (2) the final GAN model is used outside of the secure environment by external users (instructors or researchers) to generate synthetic data. This second step facilitates data handling for external users, by avoiding de-identification, which may require special user training, be costly, and/or cause loss of data fidelity. We benchmark our proposed GAN versus various baseline methods using a novel set of metrics. At equal levels of privacy and utility, GANs provide small footprint models, meeting the desired specifications of our application domain. Data, code, and a challenge that we organized for educational purposes are available.},
	booktitle = {{ESANN} 2019 - {Proceedings}, 27th {European} {Symposium} on {Artificial} {Neural} {Networks}, {Computational} {Intelligence} and {Machine} {Learning}},
	author = {Yale, Andrew and Dash, Saloni and Dutta, Ritik and Guyon, Isabelle and Pavao, Adrien and Bennett, Kristin P.},
	month = apr,
	year = {2019},
	pages = {465--470},
}

Downloads: 0

{"_id":"gbNJZnJ5CxXvEPp9p","bibbaseid":"yale-dash-dutta-guyon-pavao-bennett-privacypreservingsynthetichealthdata-2019","authorIDs":[],"author_short":["Yale, A.","Dash, S.","Dutta, R.","Guyon, I.","Pavao, A.","Bennett, K. P."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Bruges, Belgium","title":"Privacy preserving synthetic health data","isbn":"978-2-87587-065-0","url":"https://hal.inria.fr/hal-02160496","abstract":"We examine the feasibility of using synthetic medical data generated by GANs in the classroom, to teach data science in health informatics. We present an end-to-end methodology to retain instructional utility, while preserving privacy to a level, which meets regulatory requirements: (1) a GAN is trained by a certified medical-data security-aware agent, inside a secure environment; (2) the final GAN model is used outside of the secure environment by external users (instructors or researchers) to generate synthetic data. This second step facilitates data handling for external users, by avoiding de-identification, which may require special user training, be costly, and/or cause loss of data fidelity. We benchmark our proposed GAN versus various baseline methods using a novel set of metrics. At equal levels of privacy and utility, GANs provide small footprint models, meeting the desired specifications of our application domain. Data, code, and a challenge that we organized for educational purposes are available.","booktitle":"ESANN 2019 - Proceedings, 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning","author":[{"propositions":[],"lastnames":["Yale"],"firstnames":["Andrew"],"suffixes":[]},{"propositions":[],"lastnames":["Dash"],"firstnames":["Saloni"],"suffixes":[]},{"propositions":[],"lastnames":["Dutta"],"firstnames":["Ritik"],"suffixes":[]},{"propositions":[],"lastnames":["Guyon"],"firstnames":["Isabelle"],"suffixes":[]},{"propositions":[],"lastnames":["Pavao"],"firstnames":["Adrien"],"suffixes":[]},{"propositions":[],"lastnames":["Bennett"],"firstnames":["Kristin","P."],"suffixes":[]}],"month":"April","year":"2019","pages":"465–470","bibtex":"@inproceedings{yale_privacy_2019,\n\taddress = {Bruges, Belgium},\n\ttitle = {Privacy preserving synthetic health data},\n\tisbn = {978-2-87587-065-0},\n\turl = {https://hal.inria.fr/hal-02160496},\n\tabstract = {We examine the feasibility of using synthetic medical data generated by GANs in the classroom, to teach data science in health informatics. We present an end-to-end methodology to retain instructional utility, while preserving privacy to a level, which meets regulatory requirements: (1) a GAN is trained by a certified medical-data security-aware agent, inside a secure environment; (2) the final GAN model is used outside of the secure environment by external users (instructors or researchers) to generate synthetic data. This second step facilitates data handling for external users, by avoiding de-identification, which may require special user training, be costly, and/or cause loss of data fidelity. We benchmark our proposed GAN versus various baseline methods using a novel set of metrics. At equal levels of privacy and utility, GANs provide small footprint models, meeting the desired specifications of our application domain. Data, code, and a challenge that we organized for educational purposes are available.},\n\tbooktitle = {{ESANN} 2019 - {Proceedings}, 27th {European} {Symposium} on {Artificial} {Neural} {Networks}, {Computational} {Intelligence} and {Machine} {Learning}},\n\tauthor = {Yale, Andrew and Dash, Saloni and Dutta, Ritik and Guyon, Isabelle and Pavao, Adrien and Bennett, Kristin P.},\n\tmonth = apr,\n\tyear = {2019},\n\tpages = {465--470},\n}\n\n","author_short":["Yale, A.","Dash, S.","Dutta, R.","Guyon, I.","Pavao, A.","Bennett, K. P."],"key":"yale_privacy_2019","id":"yale_privacy_2019","bibbaseid":"yale-dash-dutta-guyon-pavao-bennett-privacypreservingsynthetichealthdata-2019","role":"author","urls":{"Paper":"https://hal.inria.fr/hal-02160496"},"downloads":0},"bibtype":"inproceedings","biburl":"https://api.zotero.org/users/3522498/collections/AW3NX4WW/items?key=kdJ5QIjIIc7oy1mYjjz70Rv2&format=bibtex&limit=100","creationDate":"2021-03-14T11:19:06.387Z","downloads":0,"keywords":[],"search_terms":["privacy","preserving","synthetic","health","data","yale","dash","dutta","guyon","pavao","bennett"],"title":"Privacy preserving synthetic health data","year":2019,"dataSources":["dwrmKCbrccWf5bf2H"]}