Studies in Health Technology and Informatics, 210:419–423, 2015. Paper abstract bibtex
INTRODUCTION: Diagnoses and medical procedures collected under the French system of information are recorded in a nationwide database, the "PMSI national database", which is accessible for exploitation. Quality of the data in this database is directly related to the quality of coding, which can be of poor quality. Among the proposed methods for the exploitation of health databases, data mining techniques are particularly interesting. Our objective is to build sequential rules for missing diagnoses prediction by data mining of the PMSI national database. METHOD: Our working sample was constructed from the national database for years 2007 to 2010. The information retained for rules construction were medical diagnoses and medical procedures. The rules were selected using a statistical filter, and selected rules were validated by case review based on medical letters, which enabled to estimate the improvement of diagnoses recoding. RESULTS: The work sample was made of 59,170 inpatient stays. The predicted ICD codes were E11 (non-insulin-dependent diabetes mellitus), I48 (atrial fibrillation and flutter) and I50 (heart failure).We validated three sequential rules with a substantial improvement of positive predictive value: \E11,I10,DZQM006\=>\E11\ \E11,I10,I48\=>\E11\ \I48,I69\=>\I48\ Discussion. We were able to extract by data mining three simple, reliable and effective sequential rules, with a substantial improvement in diagnoses recoding. The results of our study indicate the opportunity to improve the data quality of the national database by data mining methods.