Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text. Mitra, A., Druhl, E., Goodwin, R., & Yu, H. June, 2024. arXiv:2406.06056 [cs]Paper abstract bibtex Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06% and uncovers areas for future refinements.
@misc{mitra_synth-sbdh_2024,
title = {Synth-{SBDH}: {A} {Synthetic} {Dataset} of {Social} and {Behavioral} {Determinants} of {Health} for {Clinical} {Text}},
shorttitle = {Synth-{SBDH}},
url = {http://arxiv.org/abs/2406.06056},
abstract = {Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5\% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06\% and uncovers areas for future refinements.},
urldate = {2024-09-03},
publisher = {arXiv},
author = {Mitra, Avijit and Druhl, Emily and Goodwin, Raelene and Yu, Hong},
month = jun,
year = {2024},
note = {arXiv:2406.06056 [cs]},
keywords = {Computer Science - Computation and Language},
}
Downloads: 0
{"_id":"5uHF7yDApN95hwRyo","bibbaseid":"mitra-druhl-goodwin-yu-synthsbdhasyntheticdatasetofsocialandbehavioraldeterminantsofhealthforclinicaltext-2024","author_short":["Mitra, A.","Druhl, E.","Goodwin, R.","Yu, H."],"bibdata":{"bibtype":"misc","type":"misc","title":"Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text","shorttitle":"Synth-SBDH","url":"http://arxiv.org/abs/2406.06056","abstract":"Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06% and uncovers areas for future refinements.","urldate":"2024-09-03","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Mitra"],"firstnames":["Avijit"],"suffixes":[]},{"propositions":[],"lastnames":["Druhl"],"firstnames":["Emily"],"suffixes":[]},{"propositions":[],"lastnames":["Goodwin"],"firstnames":["Raelene"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Hong"],"suffixes":[]}],"month":"June","year":"2024","note":"arXiv:2406.06056 [cs]","keywords":"Computer Science - Computation and Language","bibtex":"@misc{mitra_synth-sbdh_2024,\n\ttitle = {Synth-{SBDH}: {A} {Synthetic} {Dataset} of {Social} and {Behavioral} {Determinants} of {Health} for {Clinical} {Text}},\n\tshorttitle = {Synth-{SBDH}},\n\turl = {http://arxiv.org/abs/2406.06056},\n\tabstract = {Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5\\% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06\\% and uncovers areas for future refinements.},\n\turldate = {2024-09-03},\n\tpublisher = {arXiv},\n\tauthor = {Mitra, Avijit and Druhl, Emily and Goodwin, Raelene and Yu, Hong},\n\tmonth = jun,\n\tyear = {2024},\n\tnote = {arXiv:2406.06056 [cs]},\n\tkeywords = {Computer Science - Computation and Language},\n}\n\n","author_short":["Mitra, A.","Druhl, E.","Goodwin, R.","Yu, H."],"key":"mitra_synth-sbdh_2024","id":"mitra_synth-sbdh_2024","bibbaseid":"mitra-druhl-goodwin-yu-synthsbdhasyntheticdatasetofsocialandbehavioraldeterminantsofhealthforclinicaltext-2024","role":"author","urls":{"Paper":"http://arxiv.org/abs/2406.06056"},"keyword":["Computer Science - Computation and Language"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"http://fenway.cs.uml.edu/papers/pubs-all.bib","dataSources":["TqaA9miSB65nRfS5H"],"keywords":["computer science - computation and language"],"search_terms":["synth","sbdh","synthetic","dataset","social","behavioral","determinants","health","clinical","text","mitra","druhl","goodwin","yu"],"title":"Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text","year":2024}