GAIN: Missing data imputation using generative adversarial nets. Yoon, J., Jordon, J., & Van Der Schaar, M. In 35th International Conference on Machine Learning, ICML 2018, volume 13, pages 9042–9051, 2018. International Machine Learning Society (IMLS). _eprint: 1806.02920
abstract   bibtex   
We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.
@inproceedings{yoon_gain_2018,
	title = {{GAIN}: {Missing} data imputation using generative adversarial nets},
	volume = {13},
	isbn = {978-1-5108-6796-3},
	abstract = {We propose a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework. Accordingly, we call our method Generative Adversarial Imputation Nets (GAIN). The generator (G) observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator (D) then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that D forces G to learn the desired distribution, we provide D with some additional information in the form of a hint vector. The hint reveals to D partial information about the missingness of the original sample, which is used by D to focus its attention on the imputation quality of particular components. This hint ensures that G does in fact learn to generate according to the true data distribution. We tested our method on various datasets and found that GAIN significantly outperforms state-of-the-art imputation methods.},
	booktitle = {35th {International} {Conference} on {Machine} {Learning}, {ICML} 2018},
	publisher = {International Machine Learning Society (IMLS)},
	author = {Yoon, Jinsung and Jordon, James and Van Der Schaar, Mihaela},
	year = {2018},
	note = {\_eprint: 1806.02920},
	pages = {9042--9051},
}
Downloads: 0