Treatment of Missing Values for Multivariate Statistical Analysis of Gel-Based Proteomics Data. Pedreschi, R., Hertog, M. L. A. T. M., Carpentier, S. C., Lammertyn, J., Robben, J., Noben, J., Panis, B., Swennen, R., & Nicolai, B. M. Proteomics, 8:1371--1383, 2008.
doi  abstract   bibtex   
The presence of missing values in gel-based proteomics data represents a real challenge if an objective statistical analysis is pursued. Different methods to handle missing values were evaluated and their influence is discussed on the selection of important proteins through multivariate techniques. The evaluated methods consisted of directly dealing with them during the multivariate analysis with the nonlinear estimation by iterative partial least squares (NIPALS) algorithm or imputing them by using either k-nearest neighbor or Bayesian principal component analysis (BPCA) before carrying out the multivariate analysis. These techniques were applied to data obtained from gels stained with classical postrunning dyes and from DIGE gels. Before applying the multivariate techniques, the normality and homoscedasticity assumptions on which parametric tests are based on were tested in order to perform a sound statistical analysis. From the three tested methods to handle missing values in our datasets, BPCA imputation of missing values showed to be the most consistent method.
@article{Pedreschi:2008aa,
	Abstract = {The presence of missing values in gel-based proteomics data represents a real challenge if an objective statistical analysis is pursued. Different methods to handle missing values were evaluated and their influence is discussed on the selection of important proteins through multivariate techniques. The evaluated methods consisted of directly dealing with them during the multivariate analysis with the nonlinear estimation by iterative partial least squares (NIPALS) algorithm or imputing them by using either k-nearest neighbor or Bayesian principal component analysis (BPCA) before carrying out the multivariate analysis. These techniques were applied to data obtained from gels stained with classical postrunning dyes and from DIGE gels. Before applying the multivariate techniques, the normality and homoscedasticity assumptions on which parametric tests are based on were tested in order to perform a sound statistical analysis. From the three tested methods to handle missing values in our datasets, BPCA imputation of missing values showed to be the most consistent method.},
	Author = {Pedreschi, Romina and Hertog, Maarten L. A. T. M. and Carpentier, Sebastien C. and Lammertyn, Jeroen and Robben, Johan and Noben, Jean-Paul and Panis, Bart and Swennen, Rony and Nicolai, Bart M.},
	Date-Added = {2008-08-05 15:48:16 -0400},
	Date-Modified = {2008-08-05 15:54:38 -0400},
	Doi = {10.1002/pmic.200700975},
	Journal = {Proteomics},
	Keywords = {impute; imputation; missing values},
	Pages = {1371--1383},
	Title = {Treatment of Missing Values for Multivariate Statistical Analysis of Gel-Based Proteomics Data},
	Volume = {8},
	Year = {2008},
	Bdsk-Url-1 = {http://dx.doi.org/10.1002/pmic.200700975}}

Downloads: 0