BioInstruct: instruction tuning of large language models for biomedical natural language processing

BioInstruct: instruction tuning of large language models for biomedical natural language processing. Tran, H., Yang, Z., Yao, Z., & Yu, H. Journal of the American Medical Informatics Association, June, 2024.

Paper doi abstract bibtex 1 download

To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

@article{tran_bioinstruct_2024,
	title = {{BioInstruct}: instruction tuning of large language models for biomedical natural language processing},
	issn = {1527-974X},
	shorttitle = {{BioInstruct}},
	url = {https://doi.org/10.1093/jamia/ocae122},
	doi = {10.1093/jamia/ocae122},
	abstract = {To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3\% in QA on average accuracy metric, 5.7\% in IE on average F1 metric, and 96\% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.},
	urldate = {2024-06-12},
	journal = {Journal of the American Medical Informatics Association},
	author = {Tran, Hieu and Yang, Zhichao and Yao, Zonghai and Yu, Hong},
	month = jun,
	year = {2024},
	pages = {ocae122},
}

Downloads: 1

{"_id":"F7to8rZ36N7HtJJB7","bibbaseid":"tran-yang-yao-yu-bioinstructinstructiontuningoflargelanguagemodelsforbiomedicalnaturallanguageprocessing-2024","author_short":["Tran, H.","Yang, Z.","Yao, Z.","Yu, H."],"bibdata":{"bibtype":"article","type":"article","title":"BioInstruct: instruction tuning of large language models for biomedical natural language processing","issn":"1527-974X","shorttitle":"BioInstruct","url":"https://doi.org/10.1093/jamia/ocae122","doi":"10.1093/jamia/ocae122","abstract":"To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.","urldate":"2024-06-12","journal":"Journal of the American Medical Informatics Association","author":[{"propositions":[],"lastnames":["Tran"],"firstnames":["Hieu"],"suffixes":[]},{"propositions":[],"lastnames":["Yang"],"firstnames":["Zhichao"],"suffixes":[]},{"propositions":[],"lastnames":["Yao"],"firstnames":["Zonghai"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Hong"],"suffixes":[]}],"month":"June","year":"2024","pages":"ocae122","bibtex":"@article{tran_bioinstruct_2024,\n\ttitle = {{BioInstruct}: instruction tuning of large language models for biomedical natural language processing},\n\tissn = {1527-974X},\n\tshorttitle = {{BioInstruct}},\n\turl = {https://doi.org/10.1093/jamia/ocae122},\n\tdoi = {10.1093/jamia/ocae122},\n\tabstract = {To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3\\% in QA on average accuracy metric, 5.7\\% in IE on average F1 metric, and 96\\% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.},\n\turldate = {2024-06-12},\n\tjournal = {Journal of the American Medical Informatics Association},\n\tauthor = {Tran, Hieu and Yang, Zhichao and Yao, Zonghai and Yu, Hong},\n\tmonth = jun,\n\tyear = {2024},\n\tpages = {ocae122},\n}\n\n","author_short":["Tran, H.","Yang, Z.","Yao, Z.","Yu, H."],"key":"tran_bioinstruct_2024","id":"tran_bioinstruct_2024","bibbaseid":"tran-yang-yao-yu-bioinstructinstructiontuningoflargelanguagemodelsforbiomedicalnaturallanguageprocessing-2024","role":"author","urls":{"Paper":"https://doi.org/10.1093/jamia/ocae122"},"metadata":{"authorlinks":{}},"downloads":1,"html":""},"bibtype":"article","biburl":"http://fenway.cs.uml.edu/papers/pubs-all.bib","dataSources":["TqaA9miSB65nRfS5H"],"keywords":[],"search_terms":["bioinstruct","instruction","tuning","large","language","models","biomedical","natural","language","processing","tran","yang","yao","yu"],"title":"BioInstruct: instruction tuning of large language models for biomedical natural language processing","year":2024,"downloads":1}