Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records. Chazard, E., Mouret, C., Ficheur, G., Schaffar, A., Beuscart, J., & Beuscart, R. International Journal of Medical Informatics, 83(4):303–312, April, 2014. Paper doi abstract bibtex PURPOSE: Medical free-text records enable to get rich information about the patients, but often need to be de-identified by removing the Protected Health Information (PHI), each time the identification of the patient is not mandatory. Pattern matching techniques require pre-defined dictionaries, and machine learning techniques require an extensive training set. Methods exist in French, but either bring weak results or are not freely available. The objective is to define and evaluate FASDIM, a Fast And Simple De-Identification Method for French medical free-text records. METHODS: FASDIM consists in removing all the words that are not present in the authorized word list, and in removing all the numbers except those that match a list of protection patterns. The corresponding lists are incremented in the course of the iterations of the method. For the evaluation, the workload is estimated in the course of records de-identification. The efficiency of the de-identification is assessed by independent medical experts on 508 discharge letters that are randomly selected and de-identified by FASDIM. Finally, the letters are encoded after and before de-identification according to 3 terminologies (ATC, ICD10, CCAM) and the codes are compared. RESULTS: The construction of the list of authorized words is progressive: 12h for the first 7000 letters, 16 additional hours for 20,000 additional letters. The Recall (proportion of removed Protected Health Information, PHI) is 98.1%, the Precision (proportion of PHI within the removed token) is 79.6% and the F-measure (harmonic mean) is 87.9%. In average 30.6 terminology codes are encoded per letter, and 99.02% of those codes are preserved despite the de-identification. CONCLUSION: FASDIM gets good results in French and is freely available. It is easy to implement and does not require any predefined dictionary.
@article{chazard_proposal_2014,
title = {Proposal and evaluation of {FASDIM}, a {Fast} {And} {Simple} {De}-{Identification} {Method} for unstructured free-text clinical records},
volume = {83},
copyright = {All rights reserved},
issn = {1872-8243},
url = {http://www.chazard.org/emmanuel/pdf_articles/paper_2014_ijmi_fasdim.pdf},
doi = {10.1016/j.ijmedinf.2013.11.005},
abstract = {PURPOSE: Medical free-text records enable to get rich information about the patients, but often need to be de-identified by removing the Protected Health Information (PHI), each time the identification of the patient is not mandatory. Pattern matching techniques require pre-defined dictionaries, and machine learning techniques require an extensive training set. Methods exist in French, but either bring weak results or are not freely available. The objective is to define and evaluate FASDIM, a Fast And Simple De-Identification Method for French medical free-text records.
METHODS: FASDIM consists in removing all the words that are not present in the authorized word list, and in removing all the numbers except those that match a list of protection patterns. The corresponding lists are incremented in the course of the iterations of the method. For the evaluation, the workload is estimated in the course of records de-identification. The efficiency of the de-identification is assessed by independent medical experts on 508 discharge letters that are randomly selected and de-identified by FASDIM. Finally, the letters are encoded after and before de-identification according to 3 terminologies (ATC, ICD10, CCAM) and the codes are compared.
RESULTS: The construction of the list of authorized words is progressive: 12h for the first 7000 letters, 16 additional hours for 20,000 additional letters. The Recall (proportion of removed Protected Health Information, PHI) is 98.1\%, the Precision (proportion of PHI within the removed token) is 79.6\% and the F-measure (harmonic mean) is 87.9\%. In average 30.6 terminology codes are encoded per letter, and 99.02\% of those codes are preserved despite the de-identification.
CONCLUSION: FASDIM gets good results in French and is freely available. It is easy to implement and does not require any predefined dictionary.},
language = {eng},
number = {4},
journal = {International Journal of Medical Informatics},
author = {Chazard, Emmanuel and Mouret, Capucine and Ficheur, Grégoire and Schaffar, Aurélien and Beuscart, Jean-Baptiste and Beuscart, Régis},
month = apr,
year = {2014},
pmid = {24370391},
keywords = {Anonymization, Confidentiality, De-identification, Free text, Natural language processing},
pages = {303--312},
}
Downloads: 0
{"_id":"BiSKt7c4W26d7LDia","bibbaseid":"chazard-mouret-ficheur-schaffar-beuscart-beuscart-proposalandevaluationoffasdimafastandsimpledeidentificationmethodforunstructuredfreetextclinicalrecords-2014","downloads":0,"creationDate":"2016-02-10T22:54:29.506Z","title":"Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records","author_short":["Chazard, E.","Mouret, C.","Ficheur, G.","Schaffar, A.","Beuscart, J.","Beuscart, R."],"year":2014,"bibtype":"article","biburl":"https://api.zotero.org/groups/2266462/items?key=MgKoXciZhHmJ176339ZdCynJ&format=bibtex&limit=100","bibdata":{"bibtype":"article","type":"article","title":"Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records","volume":"83","copyright":"All rights reserved","issn":"1872-8243","url":"http://www.chazard.org/emmanuel/pdf_articles/paper_2014_ijmi_fasdim.pdf","doi":"10.1016/j.ijmedinf.2013.11.005","abstract":"PURPOSE: Medical free-text records enable to get rich information about the patients, but often need to be de-identified by removing the Protected Health Information (PHI), each time the identification of the patient is not mandatory. Pattern matching techniques require pre-defined dictionaries, and machine learning techniques require an extensive training set. Methods exist in French, but either bring weak results or are not freely available. The objective is to define and evaluate FASDIM, a Fast And Simple De-Identification Method for French medical free-text records. METHODS: FASDIM consists in removing all the words that are not present in the authorized word list, and in removing all the numbers except those that match a list of protection patterns. The corresponding lists are incremented in the course of the iterations of the method. For the evaluation, the workload is estimated in the course of records de-identification. The efficiency of the de-identification is assessed by independent medical experts on 508 discharge letters that are randomly selected and de-identified by FASDIM. Finally, the letters are encoded after and before de-identification according to 3 terminologies (ATC, ICD10, CCAM) and the codes are compared. RESULTS: The construction of the list of authorized words is progressive: 12h for the first 7000 letters, 16 additional hours for 20,000 additional letters. The Recall (proportion of removed Protected Health Information, PHI) is 98.1%, the Precision (proportion of PHI within the removed token) is 79.6% and the F-measure (harmonic mean) is 87.9%. In average 30.6 terminology codes are encoded per letter, and 99.02% of those codes are preserved despite the de-identification. CONCLUSION: FASDIM gets good results in French and is freely available. It is easy to implement and does not require any predefined dictionary.","language":"eng","number":"4","journal":"International Journal of Medical Informatics","author":[{"propositions":[],"lastnames":["Chazard"],"firstnames":["Emmanuel"],"suffixes":[]},{"propositions":[],"lastnames":["Mouret"],"firstnames":["Capucine"],"suffixes":[]},{"propositions":[],"lastnames":["Ficheur"],"firstnames":["Grégoire"],"suffixes":[]},{"propositions":[],"lastnames":["Schaffar"],"firstnames":["Aurélien"],"suffixes":[]},{"propositions":[],"lastnames":["Beuscart"],"firstnames":["Jean-Baptiste"],"suffixes":[]},{"propositions":[],"lastnames":["Beuscart"],"firstnames":["Régis"],"suffixes":[]}],"month":"April","year":"2014","pmid":"24370391","keywords":"Anonymization, Confidentiality, De-identification, Free text, Natural language processing","pages":"303–312","bibtex":"@article{chazard_proposal_2014,\n\ttitle = {Proposal and evaluation of {FASDIM}, a {Fast} {And} {Simple} {De}-{Identification} {Method} for unstructured free-text clinical records},\n\tvolume = {83},\n\tcopyright = {All rights reserved},\n\tissn = {1872-8243},\n\turl = {http://www.chazard.org/emmanuel/pdf_articles/paper_2014_ijmi_fasdim.pdf},\n\tdoi = {10.1016/j.ijmedinf.2013.11.005},\n\tabstract = {PURPOSE: Medical free-text records enable to get rich information about the patients, but often need to be de-identified by removing the Protected Health Information (PHI), each time the identification of the patient is not mandatory. Pattern matching techniques require pre-defined dictionaries, and machine learning techniques require an extensive training set. Methods exist in French, but either bring weak results or are not freely available. The objective is to define and evaluate FASDIM, a Fast And Simple De-Identification Method for French medical free-text records.\nMETHODS: FASDIM consists in removing all the words that are not present in the authorized word list, and in removing all the numbers except those that match a list of protection patterns. The corresponding lists are incremented in the course of the iterations of the method. For the evaluation, the workload is estimated in the course of records de-identification. The efficiency of the de-identification is assessed by independent medical experts on 508 discharge letters that are randomly selected and de-identified by FASDIM. Finally, the letters are encoded after and before de-identification according to 3 terminologies (ATC, ICD10, CCAM) and the codes are compared.\nRESULTS: The construction of the list of authorized words is progressive: 12h for the first 7000 letters, 16 additional hours for 20,000 additional letters. The Recall (proportion of removed Protected Health Information, PHI) is 98.1\\%, the Precision (proportion of PHI within the removed token) is 79.6\\% and the F-measure (harmonic mean) is 87.9\\%. In average 30.6 terminology codes are encoded per letter, and 99.02\\% of those codes are preserved despite the de-identification.\nCONCLUSION: FASDIM gets good results in French and is freely available. It is easy to implement and does not require any predefined dictionary.},\n\tlanguage = {eng},\n\tnumber = {4},\n\tjournal = {International Journal of Medical Informatics},\n\tauthor = {Chazard, Emmanuel and Mouret, Capucine and Ficheur, Grégoire and Schaffar, Aurélien and Beuscart, Jean-Baptiste and Beuscart, Régis},\n\tmonth = apr,\n\tyear = {2014},\n\tpmid = {24370391},\n\tkeywords = {Anonymization, Confidentiality, De-identification, Free text, Natural language processing},\n\tpages = {303--312},\n}\n\n","author_short":["Chazard, E.","Mouret, C.","Ficheur, G.","Schaffar, A.","Beuscart, J.","Beuscart, R."],"key":"chazard_proposal_2014","id":"chazard_proposal_2014","bibbaseid":"chazard-mouret-ficheur-schaffar-beuscart-beuscart-proposalandevaluationoffasdimafastandsimpledeidentificationmethodforunstructuredfreetextclinicalrecords-2014","role":"author","urls":{"Paper":"http://www.chazard.org/emmanuel/pdf_articles/paper_2014_ijmi_fasdim.pdf"},"keyword":["Anonymization","Confidentiality","De-identification","Free text","Natural language processing"],"metadata":{"authorlinks":{"beuscart, j":"https://pro.univ-lille.fr/jean-baptiste-beuscart/publications"}},"downloads":0},"search_terms":["proposal","evaluation","fasdim","fast","simple","identification","method","unstructured","free","text","clinical","records","chazard","mouret","ficheur","schaffar","beuscart","beuscart"],"keywords":["anonymization","confidentiality","de-identification","free text","natural language processing"],"authorIDs":["tSpR3ofnve2Tso2Zt"],"dataSources":["KcAAuaxski6XBszw2","Ad3P6FkzWSCKrZQXc","PSBFFbnPhFKwYx7yq"]}