Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians

Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians. Barnett, M. L., Boddupalli, D., Nundy, S., & Bates, D. W. JAMA Netw Open, 2(3):e190096-e190096, March, 2019.
doi abstract bibtex

$<$h3$>$Importance$<$/h3$><$p$>$The traditional approach of diagnosis by individual physicians has a high rate of misdiagnosis. Pooling multiple physicians' diagnoses (collective intelligence) is a promising approach to reducing misdiagnoses, but its accuracy in clinical cases is unknown to date.$<$/p$><$h3$>$Objective$<$/h3$><$p$>$To assess how the diagnostic accuracy of groups of physicians and trainees compares with the diagnostic accuracy of individual physicians.$<$/p$><$h3$>$Design, Setting, and Participants$<$/h3$><$p$>$Cross-sectional study using data from the Human Diagnosis Project (Human Dx), a multicountry data set of ranked differential diagnoses by individual physicians, graduate trainees, and medical students (users) solving user-submitted, structured clinical cases. From May 7, 2014, to October 5, 2016, groups of 2 to 9 randomly selected physicians solved individual cases. Data analysis was performed from March 16, 2017, to July 30, 2018.$<$/p$><$h3$>$Main Outcomes and Measures$<$/h3$><$p$>$The primary outcome was diagnostic accuracy, assessed as a correct diagnosis in the top 3 ranked diagnoses for an individual; for groups, the top 3 diagnoses were a collective differential generated using a weighted combination of user diagnoses with a variety of approaches. A version of the McNemar test was used to account for clustering across repeated solvers to compare diagnostic accuracy.$<$/p$><$h3$>$Results$<$/h3$><$p$>$Of the 2069 users solving 1572 cases from the Human Dx data set, 1228 (59.4%) were residents or fellows, 431 (20.8%) were attending physicians, and 410 (19.8%) were medical students. Collective intelligence was associated with increasing diagnostic accuracy, from 62.5% (95% CI, 60.1%-64.9%) for individual physicians up to 85.6% (95% CI, 83.9%-87.4%) for groups of 9 (23.0% difference; 95% CI, 14.9%-31.2%;\emphP < .001). The range of improvement varied by the specifications used for combining groups' diagnoses, but groups consistently outperformed individuals regardless of approach. Absolute improvement in accuracy from individuals to groups of 9 varied by presenting symptom from an increase of 17.3% (95% CI, 6.4%-28.2%;\emphP = .002) for abdominal pain to 29.8% (95% CI, 3.7%-55.8%;\emphP = .02) for fever. Groups from 2 users (77.7% accuracy; 95% CI, 70.1%-84.6%) to 9 users (85.5% accuracy; 95% CI, 75.1%-95.9%) outperformed individual specialists in their subspecialty (66.3% accuracy; 95% CI, 59.1%-73.5%;\emphP < .001 vs groups of 2 and 9).$<$/p$><$h3$>$Conclusions and Relevance$<$/h3$><$p$>$A collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases. Given the few proven strategies to address misdiagnosis, this technique merits further study in clinical settings.$<$/p$>$

@article{bar19com,
  title = {Comparative {{Accuracy}} of {{Diagnosis}} by {{Collective Intelligence}} of {{Multiple Physicians}} vs {{Individual Physicians}}},
  volume = {2},
  abstract = {{$<$}h3{$>$}Importance{$<$}/h3{$><$}p{$>$}The traditional approach of diagnosis by individual physicians has a high rate of misdiagnosis. Pooling multiple physicians' diagnoses (collective intelligence) is a promising approach to reducing misdiagnoses, but its accuracy in clinical cases is unknown to date.{$<$}/p{$><$}h3{$>$}Objective{$<$}/h3{$><$}p{$>$}To assess how the diagnostic accuracy of groups of physicians and trainees compares with the diagnostic accuracy of individual physicians.{$<$}/p{$><$}h3{$>$}Design, Setting, and Participants{$<$}/h3{$><$}p{$>$}Cross-sectional study using data from the Human Diagnosis Project (Human Dx), a multicountry data set of ranked differential diagnoses by individual physicians, graduate trainees, and medical students (users) solving user-submitted, structured clinical cases. From May 7, 2014, to October 5, 2016, groups of 2 to 9 randomly selected physicians solved individual cases. Data analysis was performed from March 16, 2017, to July 30, 2018.{$<$}/p{$><$}h3{$>$}Main Outcomes and Measures{$<$}/h3{$><$}p{$>$}The primary outcome was diagnostic accuracy, assessed as a correct diagnosis in the top 3 ranked diagnoses for an individual; for groups, the top 3 diagnoses were a collective differential generated using a weighted combination of user diagnoses with a variety of approaches. A version of the McNemar test was used to account for clustering across repeated solvers to compare diagnostic accuracy.{$<$}/p{$><$}h3{$>$}Results{$<$}/h3{$><$}p{$>$}Of the 2069 users solving 1572 cases from the Human Dx data set, 1228 (59.4\%) were residents or fellows, 431 (20.8\%) were attending physicians, and 410 (19.8\%) were medical students. Collective intelligence was associated with increasing diagnostic accuracy, from 62.5\% (95\% CI, 60.1\%-64.9\%) for individual physicians up to 85.6\% (95\% CI, 83.9\%-87.4\%) for groups of 9 (23.0\% difference; 95\% CI, 14.9\%-31.2\%;\emph{P} \&lt; .001). The range of improvement varied by the specifications used for combining groups' diagnoses, but groups consistently outperformed individuals regardless of approach. Absolute improvement in accuracy from individuals to groups of 9 varied by presenting symptom from an increase of 17.3\% (95\% CI, 6.4\%-28.2\%;\emph{P} = .002) for abdominal pain to 29.8\% (95\% CI, 3.7\%-55.8\%;\emph{P} = .02) for fever. Groups from 2 users (77.7\% accuracy; 95\% CI, 70.1\%-84.6\%) to 9 users (85.5\% accuracy; 95\% CI, 75.1\%-95.9\%) outperformed individual specialists in their subspecialty (66.3\% accuracy; 95\% CI, 59.1\%-73.5\%;\emph{P} \&lt; .001 vs groups of 2 and 9).{$<$}/p{$><$}h3{$>$}Conclusions and Relevance{$<$}/h3{$><$}p{$>$}A collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases. Given the few proven strategies to address misdiagnosis, this technique merits further study in clinical settings.{$<$}/p{$>$}},
  language = {en},
  number = {3},
  journal = {JAMA Netw Open},
  doi = {10.1001/jamanetworkopen.2019.0096},
  author = {Barnett, Michael L. and Boddupalli, Dhruv and Nundy, Shantanu and Bates, David W.},
  month = mar,
  year = {2019},
  keywords = {teaching-mds,predictive-accuracy,diagnosis,accuracy,diagnostic-accuracy},
  pages = {e190096-e190096}
}

Downloads: 0

{"_id":"jue4Dd4o5442xySBK","bibbaseid":"barnett-boddupalli-nundy-bates-comparativeaccuracyofdiagnosisbycollectiveintelligenceofmultiplephysiciansvsindividualphysicians-2019","authorIDs":[],"author_short":["Barnett, M. L.","Boddupalli, D.","Nundy, S.","Bates, D. W."],"bibdata":{"bibtype":"article","type":"article","title":"Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians","volume":"2","abstract":"$<$h3$>$Importance$<$/h3$><$p$>$The traditional approach of diagnosis by individual physicians has a high rate of misdiagnosis. Pooling multiple physicians' diagnoses (collective intelligence) is a promising approach to reducing misdiagnoses, but its accuracy in clinical cases is unknown to date.$<$/p$><$h3$>$Objective$<$/h3$><$p$>$To assess how the diagnostic accuracy of groups of physicians and trainees compares with the diagnostic accuracy of individual physicians.$<$/p$><$h3$>$Design, Setting, and Participants$<$/h3$><$p$>$Cross-sectional study using data from the Human Diagnosis Project (Human Dx), a multicountry data set of ranked differential diagnoses by individual physicians, graduate trainees, and medical students (users) solving user-submitted, structured clinical cases. From May 7, 2014, to October 5, 2016, groups of 2 to 9 randomly selected physicians solved individual cases. Data analysis was performed from March 16, 2017, to July 30, 2018.$<$/p$><$h3$>$Main Outcomes and Measures$<$/h3$><$p$>$The primary outcome was diagnostic accuracy, assessed as a correct diagnosis in the top 3 ranked diagnoses for an individual; for groups, the top 3 diagnoses were a collective differential generated using a weighted combination of user diagnoses with a variety of approaches. A version of the McNemar test was used to account for clustering across repeated solvers to compare diagnostic accuracy.$<$/p$><$h3$>$Results$<$/h3$><$p$>$Of the 2069 users solving 1572 cases from the Human Dx data set, 1228 (59.4%) were residents or fellows, 431 (20.8%) were attending physicians, and 410 (19.8%) were medical students. Collective intelligence was associated with increasing diagnostic accuracy, from 62.5% (95% CI, 60.1%-64.9%) for individual physicians up to 85.6% (95% CI, 83.9%-87.4%) for groups of 9 (23.0% difference; 95% CI, 14.9%-31.2%;\\emphP < .001). The range of improvement varied by the specifications used for combining groups' diagnoses, but groups consistently outperformed individuals regardless of approach. Absolute improvement in accuracy from individuals to groups of 9 varied by presenting symptom from an increase of 17.3% (95% CI, 6.4%-28.2%;\\emphP = .002) for abdominal pain to 29.8% (95% CI, 3.7%-55.8%;\\emphP = .02) for fever. Groups from 2 users (77.7% accuracy; 95% CI, 70.1%-84.6%) to 9 users (85.5% accuracy; 95% CI, 75.1%-95.9%) outperformed individual specialists in their subspecialty (66.3% accuracy; 95% CI, 59.1%-73.5%;\\emphP < .001 vs groups of 2 and 9).$<$/p$><$h3$>$Conclusions and Relevance$<$/h3$><$p$>$A collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases. Given the few proven strategies to address misdiagnosis, this technique merits further study in clinical settings.$<$/p$>$","language":"en","number":"3","journal":"JAMA Netw Open","doi":"10.1001/jamanetworkopen.2019.0096","author":[{"propositions":[],"lastnames":["Barnett"],"firstnames":["Michael","L."],"suffixes":[]},{"propositions":[],"lastnames":["Boddupalli"],"firstnames":["Dhruv"],"suffixes":[]},{"propositions":[],"lastnames":["Nundy"],"firstnames":["Shantanu"],"suffixes":[]},{"propositions":[],"lastnames":["Bates"],"firstnames":["David","W."],"suffixes":[]}],"month":"March","year":"2019","keywords":"teaching-mds,predictive-accuracy,diagnosis,accuracy,diagnostic-accuracy","pages":"e190096-e190096","bibtex":"@article{bar19com,\n title = {Comparative {{Accuracy}} of {{Diagnosis}} by {{Collective Intelligence}} of {{Multiple Physicians}} vs {{Individual Physicians}}},\n volume = {2},\n abstract = {{$<$}h3{$>$}Importance{$<$}/h3{$><$}p{$>$}The traditional approach of diagnosis by individual physicians has a high rate of misdiagnosis. Pooling multiple physicians' diagnoses (collective intelligence) is a promising approach to reducing misdiagnoses, but its accuracy in clinical cases is unknown to date.{$<$}/p{$><$}h3{$>$}Objective{$<$}/h3{$><$}p{$>$}To assess how the diagnostic accuracy of groups of physicians and trainees compares with the diagnostic accuracy of individual physicians.{$<$}/p{$><$}h3{$>$}Design, Setting, and Participants{$<$}/h3{$><$}p{$>$}Cross-sectional study using data from the Human Diagnosis Project (Human Dx), a multicountry data set of ranked differential diagnoses by individual physicians, graduate trainees, and medical students (users) solving user-submitted, structured clinical cases. From May 7, 2014, to October 5, 2016, groups of 2 to 9 randomly selected physicians solved individual cases. Data analysis was performed from March 16, 2017, to July 30, 2018.{$<$}/p{$><$}h3{$>$}Main Outcomes and Measures{$<$}/h3{$><$}p{$>$}The primary outcome was diagnostic accuracy, assessed as a correct diagnosis in the top 3 ranked diagnoses for an individual; for groups, the top 3 diagnoses were a collective differential generated using a weighted combination of user diagnoses with a variety of approaches. A version of the McNemar test was used to account for clustering across repeated solvers to compare diagnostic accuracy.{$<$}/p{$><$}h3{$>$}Results{$<$}/h3{$><$}p{$>$}Of the 2069 users solving 1572 cases from the Human Dx data set, 1228 (59.4\\%) were residents or fellows, 431 (20.8\\%) were attending physicians, and 410 (19.8\\%) were medical students. Collective intelligence was associated with increasing diagnostic accuracy, from 62.5\\% (95\\% CI, 60.1\\%-64.9\\%) for individual physicians up to 85.6\\% (95\\% CI, 83.9\\%-87.4\\%) for groups of 9 (23.0\\% difference; 95\\% CI, 14.9\\%-31.2\\%;\\emph{P} \\< .001). The range of improvement varied by the specifications used for combining groups' diagnoses, but groups consistently outperformed individuals regardless of approach. Absolute improvement in accuracy from individuals to groups of 9 varied by presenting symptom from an increase of 17.3\\% (95\\% CI, 6.4\\%-28.2\\%;\\emph{P} = .002) for abdominal pain to 29.8\\% (95\\% CI, 3.7\\%-55.8\\%;\\emph{P} = .02) for fever. Groups from 2 users (77.7\\% accuracy; 95\\% CI, 70.1\\%-84.6\\%) to 9 users (85.5\\% accuracy; 95\\% CI, 75.1\\%-95.9\\%) outperformed individual specialists in their subspecialty (66.3\\% accuracy; 95\\% CI, 59.1\\%-73.5\\%;\\emph{P} \\< .001 vs groups of 2 and 9).{$<$}/p{$><$}h3{$>$}Conclusions and Relevance{$<$}/h3{$><$}p{$>$}A collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases. Given the few proven strategies to address misdiagnosis, this technique merits further study in clinical settings.{$<$}/p{$>$}},\n language = {en},\n number = {3},\n journal = {JAMA Netw Open},\n doi = {10.1001/jamanetworkopen.2019.0096},\n author = {Barnett, Michael L. and Boddupalli, Dhruv and Nundy, Shantanu and Bates, David W.},\n month = mar,\n year = {2019},\n keywords = {teaching-mds,predictive-accuracy,diagnosis,accuracy,diagnostic-accuracy},\n pages = {e190096-e190096}\n}\n\n","author_short":["Barnett, M. L.","Boddupalli, D.","Nundy, S.","Bates, D. W."],"key":"bar19com","id":"bar19com","bibbaseid":"barnett-boddupalli-nundy-bates-comparativeaccuracyofdiagnosisbycollectiveintelligenceofmultiplephysiciansvsindividualphysicians-2019","role":"author","urls":{},"keyword":["teaching-mds","predictive-accuracy","diagnosis","accuracy","diagnostic-accuracy"],"downloads":0},"bibtype":"article","biburl":"http://hbiostat.org/bib/harrelfe.bib","creationDate":"2019-08-02T16:23:28.736Z","downloads":0,"keywords":["teaching-mds","predictive-accuracy","diagnosis","accuracy","diagnostic-accuracy"],"search_terms":["comparative","accuracy","diagnosis","collective","intelligence","multiple","physicians","individual","physicians","barnett","boddupalli","nundy","bates"],"title":"Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians","year":2019,"dataSources":["mEQakjn8ggpMsnGJi"]}