Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach. McTeer, M., Henderson, R., Anstee, Q. M., & Missier, P. Mathematics, 2024. Paper doi abstract bibtex Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.
@Article{math12050777,
AUTHOR = {McTeer, Matthew and Henderson, Robin and Anstee, Quentin M. and Missier, Paolo},
TITLE = {Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach},
JOURNAL = {Mathematics},
VOLUME = {12},
YEAR = {2024},
NUMBER = {5},
ARTICLE-NUMBER = {777},
URL = {https://www.mdpi.com/2227-7390/12/5/777},
ISSN = {2227-7390},
ABSTRACT = {Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.},
DOI = {10.3390/math12050777}
}
Downloads: 0
{"_id":"pNntWCdCM9MMmWvC9","bibbaseid":"mcteer-henderson-anstee-missier-handlingoverlappingasymmetricdatasetsatwicepenalizedpsplineapproach-2024","author_short":["McTeer, M.","Henderson, R.","Anstee, Q. M.","Missier, P."],"bibdata":{"bibtype":"article","type":"article","author":[{"propositions":[],"lastnames":["McTeer"],"firstnames":["Matthew"],"suffixes":[]},{"propositions":[],"lastnames":["Henderson"],"firstnames":["Robin"],"suffixes":[]},{"propositions":[],"lastnames":["Anstee"],"firstnames":["Quentin","M."],"suffixes":[]},{"propositions":[],"lastnames":["Missier"],"firstnames":["Paolo"],"suffixes":[]}],"title":"Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach","journal":"Mathematics","volume":"12","year":"2024","number":"5","article-number":"777","url":"https://www.mdpi.com/2227-7390/12/5/777","issn":"2227-7390","abstract":"Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.","doi":"10.3390/math12050777","bibtex":"@Article{math12050777,\nAUTHOR = {McTeer, Matthew and Henderson, Robin and Anstee, Quentin M. and Missier, Paolo},\nTITLE = {Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach},\nJOURNAL = {Mathematics},\nVOLUME = {12},\nYEAR = {2024},\nNUMBER = {5},\nARTICLE-NUMBER = {777},\nURL = {https://www.mdpi.com/2227-7390/12/5/777},\nISSN = {2227-7390},\nABSTRACT = {Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.},\nDOI = {10.3390/math12050777}\n}\n\n","author_short":["McTeer, M.","Henderson, R.","Anstee, Q. M.","Missier, P."],"key":"math12050777","id":"math12050777","bibbaseid":"mcteer-henderson-anstee-missier-handlingoverlappingasymmetricdatasetsatwicepenalizedpsplineapproach-2024","role":"author","urls":{"Paper":"https://www.mdpi.com/2227-7390/12/5/777"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/network/files/tHMs8ic86gSWoTp44","dataSources":["ze2X9uz8Dcv2oGipf","afppXLgSuddAzAL9e","wJE4ynGem9MRsXBRn"],"keywords":[],"search_terms":["handling","overlapping","asymmetric","data","sets","twice","penalized","spline","approach","mcteer","henderson","anstee","missier"],"title":"Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach","year":2024}