A comparison of statistical and machine-learning techniques in evaluating the association between dietary patterns and 10-year cardiometabolic risk (2002–2012): the ATTICA study. Panaretos, D., Koloverou, E., Dimopoulos, A. C., Kouli, G., Vamvakari, M., Tzavelas, G., Pitsavos, C., & Panagiotakos, D. B. British Journal of Nutrition, 120(3):326–334, August, 2018.
A comparison of statistical and machine-learning techniques in evaluating the association between dietary patterns and 10-year cardiometabolic risk (2002–2012): the ATTICA study [link]Paper  doi  abstract   bibtex   
Abstract Statistical methods are usually applied in examining diet–disease associations, whereas factor analysis is commonly used for dietary pattern recognition. Recently, machine learning (ML) has been also proposed as an alternative technique in health classification. In this work, the predictive accuracy of statistical v . ML methodologies as regards the association of dietary patterns on CVD risk was tested. During 2001–2002, 3042 men and women (45 ( sd 14) years) were enrolled in the ATTICA study. In 2011–2012, the 10-year CVD follow-up was performed among 2020 participants. Item Response Theory was applied to create a metric of combined 10-year cardiometabolic risk, the ‘Cardiometabolic Health Score’, that incorporated incidence of CVD, diabetes, hypertension and hypercholesterolaemia. Factor analysis was performed to extract dietary patterns, on the basis of either foods or nutrients consumed; linear regression analysis was used to assess their association with the cardiometabolic score. Two ML techniques (k-nearest-neighbor’s algorithm and random-forests decision tree) were applied to evaluate participants’ health based on dietary information. Factor analysis revealed five and three factors from foods and nutrients, respectively, explaining 54 and 65 % of the total variation in intake. Nutrient and food pattern regression models showed similar accuracy in correctly classifying an individual according to the cardiometabolic risk ( R 2 =9·6 % and R 2 =8·3 %, respectively). ML techniques were superior compared with linear regression in correct classification of the individuals according to the Health Score (accuracy approximately 38 v . 6 %, respectively), whereas the two ML methods showed equal classification ability. Conclusively, ML methods could be a valuable tool in the field of nutritional epidemiology, leading to more accurate disease-risk evaluation.
@article{panaretos_comparison_2018,
	title = {A comparison of statistical and machine-learning techniques in evaluating the association between dietary patterns and 10-year cardiometabolic risk (2002–2012): the {ATTICA} study},
	volume = {120},
	issn = {0007-1145, 1475-2662},
	shorttitle = {A comparison of statistical and machine-learning techniques in evaluating the association between dietary patterns and 10-year cardiometabolic risk (2002–2012)},
	url = {https://www.cambridge.org/core/product/identifier/S0007114518001150/type/journal_article},
	doi = {10.1017/S0007114518001150},
	abstract = {Abstract
            
              Statistical methods are usually applied in examining diet–disease associations, whereas factor analysis is commonly used for dietary pattern recognition. Recently, machine learning (ML) has been also proposed as an alternative technique in health classification. In this work, the predictive accuracy of statistical
              v
              . ML methodologies as regards the association of dietary patterns on CVD risk was tested. During 2001–2002, 3042 men and women (45 (
              sd
              14) years) were enrolled in the ATTICA study. In 2011–2012, the 10-year CVD follow-up was performed among 2020 participants. Item Response Theory was applied to create a metric of combined 10-year cardiometabolic risk, the ‘Cardiometabolic Health Score’, that incorporated incidence of CVD, diabetes, hypertension and hypercholesterolaemia. Factor analysis was performed to extract dietary patterns, on the basis of either foods or nutrients consumed; linear regression analysis was used to assess their association with the cardiometabolic score. Two ML techniques (k-nearest-neighbor’s algorithm and random-forests decision tree) were applied to evaluate participants’ health based on dietary information. Factor analysis revealed five and three factors from foods and nutrients, respectively, explaining 54 and 65 \% of the total variation in intake. Nutrient and food pattern regression models showed similar accuracy in correctly classifying an individual according to the cardiometabolic risk (
              R
              2
              =9·6 \% and
              R
              2
              =8·3 \%, respectively). ML techniques were superior compared with linear regression in correct classification of the individuals according to the Health Score (accuracy approximately 38
              v
              . 6 \%, respectively), whereas the two ML methods showed equal classification ability. Conclusively, ML methods could be a valuable tool in the field of nutritional epidemiology, leading to more accurate disease-risk evaluation.},
	language = {en},
	number = {3},
	urldate = {2022-11-21},
	journal = {British Journal of Nutrition},
	author = {Panaretos, Dimitris and Koloverou, Efi and Dimopoulos, Alexandros C. and Kouli, Georgia-Maria and Vamvakari, Malvina and Tzavelas, George and Pitsavos, Christos and Panagiotakos, Demosthenes B.},
	month = aug,
	year = {2018},
	keywords = {ATTICA study},
	pages = {326--334},
}

Downloads: 0