Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing

Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing. Yao, Z., Cao, Y., Yang, Z., & Yu, H. AMIA Summits on Translational Science Proceedings, 2023:592–601, June, 2023.

Paper abstract bibtex

Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs’ knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduced context variance into the prompt generation and proposed a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we proposed the concept of ”Misunderstand” in LAMA for the first time. Through experiments on 12 PLMs, we showed that our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric make BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle ”understand” from just ”read and copy”.

@article{yao_context_2023,
	title = {Context {Variance} {Evaluation} of {Pretrained} {Language} {Models} for {Prompt}-based {Biomedical} {Knowledge} {Probing}},
	volume = {2023},
	issn = {2153-4063},
	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283095/},
	abstract = {Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs’ knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduced context variance into the prompt generation and proposed a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we proposed the concept of ”Misunderstand” in LAMA for the first time. Through experiments on 12 PLMs, we showed that our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric make BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle ”understand” from just ”read and copy”.},
	urldate = {2023-11-14},
	journal = {AMIA Summits on Translational Science Proceedings},
	author = {Yao, Zonghai and Cao, Yi and Yang, Zhichao and Yu, Hong},
	month = jun,
	year = {2023},
	pmid = {37350903},
	pmcid = {PMC10283095},
	pages = {592--601},
}

Downloads: 0

{"_id":"isqn4mi5E4BBnPtJh","bibbaseid":"yao-cao-yang-yu-contextvarianceevaluationofpretrainedlanguagemodelsforpromptbasedbiomedicalknowledgeprobing-2023","author_short":["Yao, Z.","Cao, Y.","Yang, Z.","Yu, H."],"bibdata":{"bibtype":"article","type":"article","title":"Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing","volume":"2023","issn":"2153-4063","url":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283095/","abstract":"Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs’ knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduced context variance into the prompt generation and proposed a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we proposed the concept of ”Misunderstand” in LAMA for the first time. Through experiments on 12 PLMs, we showed that our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric make BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle ”understand” from just ”read and copy”.","urldate":"2023-11-14","journal":"AMIA Summits on Translational Science Proceedings","author":[{"propositions":[],"lastnames":["Yao"],"firstnames":["Zonghai"],"suffixes":[]},{"propositions":[],"lastnames":["Cao"],"firstnames":["Yi"],"suffixes":[]},{"propositions":[],"lastnames":["Yang"],"firstnames":["Zhichao"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Hong"],"suffixes":[]}],"month":"June","year":"2023","pmid":"37350903","pmcid":"PMC10283095","pages":"592–601","bibtex":"@article{yao_context_2023,\n\ttitle = {Context {Variance} {Evaluation} of {Pretrained} {Language} {Models} for {Prompt}-based {Biomedical} {Knowledge} {Probing}},\n\tvolume = {2023},\n\tissn = {2153-4063},\n\turl = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283095/},\n\tabstract = {Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs’ knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduced context variance into the prompt generation and proposed a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we proposed the concept of ”Misunderstand” in LAMA for the first time. Through experiments on 12 PLMs, we showed that our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric make BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle ”understand” from just ”read and copy”.},\n\turldate = {2023-11-14},\n\tjournal = {AMIA Summits on Translational Science Proceedings},\n\tauthor = {Yao, Zonghai and Cao, Yi and Yang, Zhichao and Yu, Hong},\n\tmonth = jun,\n\tyear = {2023},\n\tpmid = {37350903},\n\tpmcid = {PMC10283095},\n\tpages = {592--601},\n}\n\n","author_short":["Yao, Z.","Cao, Y.","Yang, Z.","Yu, H."],"key":"yao_context_2023","id":"yao_context_2023","bibbaseid":"yao-cao-yang-yu-contextvarianceevaluationofpretrainedlanguagemodelsforpromptbasedbiomedicalknowledgeprobing-2023","role":"author","urls":{"Paper":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283095/"},"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"http://fenway.cs.uml.edu/papers/pubs-all.bib","dataSources":["TqaA9miSB65nRfS5H"],"keywords":[],"search_terms":["context","variance","evaluation","pretrained","language","models","prompt","based","biomedical","knowledge","probing","yao","cao","yang","yu"],"title":"Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing","year":2023}