GAIN: Missing data imputation using generative adversarial nets

GAIN: Missing data imputation using generative adversarial nets. Yoon, J., Jordon, J., & Van Der Schaar, M. In 35th International Conference on Machine Learning, ICML 2018, volume 13, pages 9042–9051, 2018. International Machine Learning Society (IMLS). _eprint: 1806.02920
abstract bibtex

We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.

@inproceedings{yoon_gain_2018,
	title = {{GAIN}: {Missing} data imputation using generative adversarial nets},
	volume = {13},
	isbn = {978-1-5108-6796-3},
	abstract = {We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.},
	booktitle = {35th {International} {Conference} on {Machine} {Learning}, {ICML} 2018},
	publisher = {International Machine Learning Society (IMLS)},
	author = {Yoon, Jinsung and Jordon, James and Van Der Schaar, Mihaela},
	year = {2018},
	note = {\_eprint: 1806.02920},
	pages = {9042--9051},
}

Downloads: 0

{"_id":"CZjNbSDFpPBorr2hX","bibbaseid":"yoon-jordon-vanderschaar-gainmissingdataimputationusinggenerativeadversarialnets-2018","authorIDs":[],"author_short":["Yoon, J.","Jordon, J.","Van Der Schaar, M."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"GAIN: Missing data imputation using generative adversarial nets","volume":"13","isbn":"978-1-5108-6796-3","abstract":"We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.","booktitle":"35th International Conference on Machine Learning, ICML 2018","publisher":"International Machine Learning Society (IMLS)","author":[{"propositions":[],"lastnames":["Yoon"],"firstnames":["Jinsung"],"suffixes":[]},{"propositions":[],"lastnames":["Jordon"],"firstnames":["James"],"suffixes":[]},{"propositions":[],"lastnames":["Van","Der","Schaar"],"firstnames":["Mihaela"],"suffixes":[]}],"year":"2018","note":"_eprint: 1806.02920","pages":"9042–9051","bibtex":"@inproceedings{yoon_gain_2018,\n\ttitle = {{GAIN}: {Missing} data imputation using generative adversarial nets},\n\tvolume = {13},\n\tisbn = {978-1-5108-6796-3},\n\tabstract = {We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.},\n\tbooktitle = {35th {International} {Conference} on {Machine} {Learning}, {ICML} 2018},\n\tpublisher = {International Machine Learning Society (IMLS)},\n\tauthor = {Yoon, Jinsung and Jordon, James and Van Der Schaar, Mihaela},\n\tyear = {2018},\n\tnote = {\\_eprint: 1806.02920},\n\tpages = {9042--9051},\n}\n\n","author_short":["Yoon, J.","Jordon, J.","Van Der Schaar, M."],"key":"yoon_gain_2018","id":"yoon_gain_2018","bibbaseid":"yoon-jordon-vanderschaar-gainmissingdataimputationusinggenerativeadversarialnets-2018","role":"author","urls":{},"downloads":0},"bibtype":"inproceedings","biburl":"https://api.zotero.org/users/3522498/collections/AW3NX4WW/items?key=kdJ5QIjIIc7oy1mYjjz70Rv2&format=bibtex&limit=100","creationDate":"2021-03-14T11:19:06.375Z","downloads":0,"keywords":[],"search_terms":["gain","missing","data","imputation","using","generative","adversarial","nets","yoon","jordon","van der schaar"],"title":"GAIN: Missing data imputation using generative adversarial nets","year":2018,"dataSources":["dwrmKCbrccWf5bf2H"]}