2014. Paper abstract bibtex
Over 25 million people, or nearly 8.3% of the entire United States population, have diabetes. Diabetes is associated with a wide range of complications from heart disease and strokes to blindness and kidney disease. Combined with electronic medical record systems, an imple- mentation of a smart predictor could prompt high-risk patients to obtain diabetes testing in cases when the physician had not thought to recommend it. Based on the Kaggle Practice Fusion Diabetes Classi⬚cation challenge, we aim to build a model to determine whether a patient is at risk for Type II diabetes given his/her set of electronic health records. Unlike the original competition, which assumes that the algorithm will have access to the full medical record of patients and that patients all have a standard database (e.g., exact same tests taken, same recorded variables), we are interested in creating a model that assumes we only know part of the medical record as the input. For instance, if a patient has not undergone the full battery of tests as all the patients in the original training dataset had, or if we are missing information from a patient's medical record, we want to still be able to classify and output whether the patient has diabetes based on the reduced amount of information. In our project, our algorithm will be scored by the false positive and false negative rates, which when combined gives us the total error rate in diagnosing diabetes for test data sets.