Secondary Use of Healthcare Structured Data: The Challenge of Domain-Knowledge Based Extraction of Features

Secondary Use of Healthcare Structured Data: The Challenge of Domain-Knowledge Based Extraction of Features. Chazard, E., Ficheur, G., Caron, A., Lamer, A., Labreuche, J., Cuggia, M., Genin, M., Bouzille, G., & Duhamel, A. Studies in Health Technology and Informatics, 255:15–19, 2018.
abstract bibtex

Secondary use of clinical structured data takes an important place in healthcare research. It was first described by Fayyad as "knowledge discovery in databases". Feature extraction is an important phase but received little attention. The objectives of this paper are: 1) to propose an updated representation of data reuse in healthcare, 2) to illustrate methods and objectives of feature extraction, and 3) to discuss the place of domain-specific knowledge. MATERIAL AND METHODS: an updated representation is proposed. Then, a case study consists of automatically identifying acute renal failure and discovering risk factors, by secondary use of structured data. Finally, a literature review published par Meystre et al. is analyzed. RESULTS: 1) we propose a description of data reuse in 5 phases. Phase 1 is data preprocessing (cleansing, linkage, terminological alignment, unit conversions, deidentification), it enables to construct a data warehouse. Phase 2 is feature extraction. Phase 3 is statistical and graphical mining. Phase 4 consists of expert filtering and reorganization of statistical results. Phase 5 is decision making. 2) The case study illustrates how time-dependent features can be extracted from laboratory results and drug administrations, using domain-specific knowledge. 3) Among the 200 papers cited by Meystre et al., the first and last authors were affiliated to health institutions in 74% (68% for methodological papers, and 79% for applied papers). DISCUSSION: features extraction has a major impact on success of data reuse. Specific knowledge-based reasoning takes an important place in feature extraction, which requires tight collaboration between computer scientists, statisticians, and health professionals.

@article{chazard_secondary_2018,
	title = {Secondary {Use} of {Healthcare} {Structured} {Data}: {The} {Challenge} of {Domain}-{Knowledge} {Based} {Extraction} of {Features}},
	volume = {255},
	issn = {0926-9630},
	shorttitle = {Secondary {Use} of {Healthcare} {Structured} {Data}},
	abstract = {Secondary use of clinical structured data takes an important place in healthcare research. It was first described by Fayyad as "knowledge discovery in databases". Feature extraction is an important phase but received little attention. The objectives of this paper are: 1) to propose an updated representation of data reuse in healthcare, 2) to illustrate methods and objectives of feature extraction, and 3) to discuss the place of domain-specific knowledge.
MATERIAL AND METHODS: an updated representation is proposed. Then, a case study consists of automatically identifying acute renal failure and discovering risk factors, by secondary use of structured data. Finally, a literature review published par Meystre et al. is analyzed.
RESULTS: 1) we propose a description of data reuse in 5 phases. Phase 1 is data preprocessing (cleansing, linkage, terminological alignment, unit conversions, deidentification), it enables to construct a data warehouse. Phase 2 is feature extraction. Phase 3 is statistical and graphical mining. Phase 4 consists of expert filtering and reorganization of statistical results. Phase 5 is decision making. 2) The case study illustrates how time-dependent features can be extracted from laboratory results and drug administrations, using domain-specific knowledge. 3) Among the 200 papers cited by Meystre et al., the first and last authors were affiliated to health institutions in 74\% (68\% for methodological papers, and 79\% for applied papers).
DISCUSSION: features extraction has a major impact on success of data reuse. Specific knowledge-based reasoning takes an important place in feature extraction, which requires tight collaboration between computer scientists, statisticians, and health professionals.},
	language = {eng},
	journal = {Studies in Health Technology and Informatics},
	author = {Chazard, Emmanuel and Ficheur, Grégoire and Caron, Alexandre and Lamer, Antoine and Labreuche, Julien and Cuggia, Marc and Genin, Michaël and Bouzille, Guillaume and Duhamel, Alain},
	year = {2018},
	pmid = {30306898},
	keywords = {Data reuse, data transformation, feature extraction},
	pages = {15--19},
}

Downloads: 0

{"_id":"6Q3Pr2Fnicj5h8TCz","bibbaseid":"chazard-ficheur-caron-lamer-labreuche-cuggia-genin-bouzille-etal-secondaryuseofhealthcarestructureddatathechallengeofdomainknowledgebasedextractionoffeatures-2018","downloads":0,"creationDate":"2018-10-14T18:04:38.505Z","title":"Secondary Use of Healthcare Structured Data: The Challenge of Domain-Knowledge Based Extraction of Features","author_short":["Chazard, E.","Ficheur, G.","Caron, A.","Lamer, A.","Labreuche, J.","Cuggia, M.","Genin, M.","Bouzille, G.","Duhamel, A."],"year":2018,"bibtype":"article","biburl":"https://api.zotero.org/users/1597782/collections/MSB7W4UM/items?key=gxIPM4PJtMVcB8OpssCWodtP&format=bibtex&limit=100&start=100&sort=date","bibdata":{"bibtype":"article","type":"article","title":"Secondary Use of Healthcare Structured Data: The Challenge of Domain-Knowledge Based Extraction of Features","volume":"255","issn":"0926-9630","shorttitle":"Secondary Use of Healthcare Structured Data","abstract":"Secondary use of clinical structured data takes an important place in healthcare research. It was first described by Fayyad as \"knowledge discovery in databases\". Feature extraction is an important phase but received little attention. The objectives of this paper are: 1) to propose an updated representation of data reuse in healthcare, 2) to illustrate methods and objectives of feature extraction, and 3) to discuss the place of domain-specific knowledge. MATERIAL AND METHODS: an updated representation is proposed. Then, a case study consists of automatically identifying acute renal failure and discovering risk factors, by secondary use of structured data. Finally, a literature review published par Meystre et al. is analyzed. RESULTS: 1) we propose a description of data reuse in 5 phases. Phase 1 is data preprocessing (cleansing, linkage, terminological alignment, unit conversions, deidentification), it enables to construct a data warehouse. Phase 2 is feature extraction. Phase 3 is statistical and graphical mining. Phase 4 consists of expert filtering and reorganization of statistical results. Phase 5 is decision making. 2) The case study illustrates how time-dependent features can be extracted from laboratory results and drug administrations, using domain-specific knowledge. 3) Among the 200 papers cited by Meystre et al., the first and last authors were affiliated to health institutions in 74% (68% for methodological papers, and 79% for applied papers). DISCUSSION: features extraction has a major impact on success of data reuse. Specific knowledge-based reasoning takes an important place in feature extraction, which requires tight collaboration between computer scientists, statisticians, and health professionals.","language":"eng","journal":"Studies in Health Technology and Informatics","author":[{"propositions":[],"lastnames":["Chazard"],"firstnames":["Emmanuel"],"suffixes":[]},{"propositions":[],"lastnames":["Ficheur"],"firstnames":["Grégoire"],"suffixes":[]},{"propositions":[],"lastnames":["Caron"],"firstnames":["Alexandre"],"suffixes":[]},{"propositions":[],"lastnames":["Lamer"],"firstnames":["Antoine"],"suffixes":[]},{"propositions":[],"lastnames":["Labreuche"],"firstnames":["Julien"],"suffixes":[]},{"propositions":[],"lastnames":["Cuggia"],"firstnames":["Marc"],"suffixes":[]},{"propositions":[],"lastnames":["Genin"],"firstnames":["Michaël"],"suffixes":[]},{"propositions":[],"lastnames":["Bouzille"],"firstnames":["Guillaume"],"suffixes":[]},{"propositions":[],"lastnames":["Duhamel"],"firstnames":["Alain"],"suffixes":[]}],"year":"2018","pmid":"30306898","keywords":"Data reuse, data transformation, feature extraction","pages":"15–19","bibtex":"@article{chazard_secondary_2018,\n\ttitle = {Secondary {Use} of {Healthcare} {Structured} {Data}: {The} {Challenge} of {Domain}-{Knowledge} {Based} {Extraction} of {Features}},\n\tvolume = {255},\n\tissn = {0926-9630},\n\tshorttitle = {Secondary {Use} of {Healthcare} {Structured} {Data}},\n\tabstract = {Secondary use of clinical structured data takes an important place in healthcare research. It was first described by Fayyad as \"knowledge discovery in databases\". Feature extraction is an important phase but received little attention. The objectives of this paper are: 1) to propose an updated representation of data reuse in healthcare, 2) to illustrate methods and objectives of feature extraction, and 3) to discuss the place of domain-specific knowledge.\nMATERIAL AND METHODS: an updated representation is proposed. Then, a case study consists of automatically identifying acute renal failure and discovering risk factors, by secondary use of structured data. Finally, a literature review published par Meystre et al. is analyzed.\nRESULTS: 1) we propose a description of data reuse in 5 phases. Phase 1 is data preprocessing (cleansing, linkage, terminological alignment, unit conversions, deidentification), it enables to construct a data warehouse. Phase 2 is feature extraction. Phase 3 is statistical and graphical mining. Phase 4 consists of expert filtering and reorganization of statistical results. Phase 5 is decision making. 2) The case study illustrates how time-dependent features can be extracted from laboratory results and drug administrations, using domain-specific knowledge. 3) Among the 200 papers cited by Meystre et al., the first and last authors were affiliated to health institutions in 74\\% (68\\% for methodological papers, and 79\\% for applied papers).\nDISCUSSION: features extraction has a major impact on success of data reuse. Specific knowledge-based reasoning takes an important place in feature extraction, which requires tight collaboration between computer scientists, statisticians, and health professionals.},\n\tlanguage = {eng},\n\tjournal = {Studies in Health Technology and Informatics},\n\tauthor = {Chazard, Emmanuel and Ficheur, Grégoire and Caron, Alexandre and Lamer, Antoine and Labreuche, Julien and Cuggia, Marc and Genin, Michaël and Bouzille, Guillaume and Duhamel, Alain},\n\tyear = {2018},\n\tpmid = {30306898},\n\tkeywords = {Data reuse, data transformation, feature extraction},\n\tpages = {15--19},\n}\n\n","author_short":["Chazard, E.","Ficheur, G.","Caron, A.","Lamer, A.","Labreuche, J.","Cuggia, M.","Genin, M.","Bouzille, G.","Duhamel, A."],"key":"chazard_secondary_2018","id":"chazard_secondary_2018","bibbaseid":"chazard-ficheur-caron-lamer-labreuche-cuggia-genin-bouzille-etal-secondaryuseofhealthcarestructureddatathechallengeofdomainknowledgebasedextractionoffeatures-2018","role":"author","urls":{},"keyword":["Data reuse","data transformation","feature extraction"],"metadata":{"authorlinks":{"lamer, a":"https://pro.univ-lille.fr/antoine-lamer/publications"}},"downloads":0},"search_terms":["secondary","use","healthcare","structured","data","challenge","domain","knowledge","based","extraction","features","chazard","ficheur","caron","lamer","labreuche","cuggia","genin","bouzille","duhamel"],"keywords":["data reuse","data transformation","feature extraction"],"authorIDs":["aw6LeAvaSNqCYuymm"],"dataSources":["32rcJSP2nwdHwCNxn","Ad3P6FkzWSCKrZQXc"]}