Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text. Mitra, A., Druhl, E., Goodwin, R., & Yu, H. June, 2024. arXiv:2406.06056 [cs]
Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text [link]Paper  abstract   bibtex   1 download  
Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06% and uncovers areas for future refinements.
@misc{mitra_synth-sbdh_2024,
	title = {Synth-{SBDH}: {A} {Synthetic} {Dataset} of {Social} and {Behavioral} {Determinants} of {Health} for {Clinical} {Text}},
	shorttitle = {Synth-{SBDH}},
	url = {http://arxiv.org/abs/2406.06056},
	abstract = {Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5\% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06\% and uncovers areas for future refinements.},
	urldate = {2024-09-03},
	publisher = {arXiv},
	author = {Mitra, Avijit and Druhl, Emily and Goodwin, Raelene and Yu, Hong},
	month = jun,
	year = {2024},
	note = {arXiv:2406.06056 [cs]},
	keywords = {Computer Science - Computation and Language},
}

Downloads: 1