Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): A Method for Populating Knowledge Bases Using Zero-Shot Learning

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): A Method for Populating Knowledge Bases Using Zero-Shot Learning. Caufield, J. H., Hegde, H., Emonet, V., Harris, N. L., Joachimiak, M. P., Matentzoglu, N., Kim, H., Moxon, S. A. T., Reese, J. T., Haendel, M. A., Robinson, P. N., & Mungall, C. J. December, 2023.
doi abstract bibtex

Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.

@misc{caufieldStructuredPromptInterrogation2023,
  title = {Structured Prompt Interrogation and Recursive Extraction of Semantics ({{SPIRES}}): {{A}} Method for Populating Knowledge Bases Using Zero-Shot Learning},
  shorttitle = {Structured Prompt Interrogation and Recursive Extraction of Semantics ({{SPIRES}})},
  author = {Caufield, J. Harry and Hegde, Harshad and Emonet, Vincent and Harris, Nomi L. and Joachimiak, Marcin P. and Matentzoglu, Nicolas and Kim, HyeongSik and Moxon, Sierra A. T. and Reese, Justin T. and Haendel, Melissa A. and Robinson, Peter N. and Mungall, Christopher J.},
  year = {2023},
  month = dec,
  number = {arXiv:2304.02711},
  eprint = {2304.02711},
  primaryclass = {cs},
  publisher = {arXiv},
  doi = {10.48550/arXiv.2304.02711},
  urldate = {2024-03-13},
  abstract = {Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.},
  archiveprefix = {arxiv},
  keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning},
  groups = {Ontologies and AI},
  timestamp = {2024-03-13T12:53:10Z},
  file = {caufieldStructuredPromptInterrogation2023.pdf:/home/upal/Zotero/storage/QEH3NQLR/caufieldStructuredPromptInterrogation2023.pdf:application/pdf;arXiv.org Snapshot:/home/upal/Zotero/storage/2JQ9BZ9M/2304.html:text/html}
}

Downloads: 0

{"_id":"8xkwfFvphsRoEKGWo","bibbaseid":"caufield-hegde-emonet-harris-joachimiak-matentzoglu-kim-moxon-etal-structuredpromptinterrogationandrecursiveextractionofsemanticsspiresamethodforpopulatingknowledgebasesusingzeroshotlearning-2023","author_short":["Caufield, J. H.","Hegde, H.","Emonet, V.","Harris, N. L.","Joachimiak, M. P.","Matentzoglu, N.","Kim, H.","Moxon, S. A. T.","Reese, J. T.","Haendel, M. A.","Robinson, P. N.","Mungall, C. J."],"bibdata":{"bibtype":"misc","type":"misc","title":"Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): A Method for Populating Knowledge Bases Using Zero-Shot Learning","shorttitle":"Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES)","author":[{"propositions":[],"lastnames":["Caufield"],"firstnames":["J.","Harry"],"suffixes":[]},{"propositions":[],"lastnames":["Hegde"],"firstnames":["Harshad"],"suffixes":[]},{"propositions":[],"lastnames":["Emonet"],"firstnames":["Vincent"],"suffixes":[]},{"propositions":[],"lastnames":["Harris"],"firstnames":["Nomi","L."],"suffixes":[]},{"propositions":[],"lastnames":["Joachimiak"],"firstnames":["Marcin","P."],"suffixes":[]},{"propositions":[],"lastnames":["Matentzoglu"],"firstnames":["Nicolas"],"suffixes":[]},{"propositions":[],"lastnames":["Kim"],"firstnames":["HyeongSik"],"suffixes":[]},{"propositions":[],"lastnames":["Moxon"],"firstnames":["Sierra","A.","T."],"suffixes":[]},{"propositions":[],"lastnames":["Reese"],"firstnames":["Justin","T."],"suffixes":[]},{"propositions":[],"lastnames":["Haendel"],"firstnames":["Melissa","A."],"suffixes":[]},{"propositions":[],"lastnames":["Robinson"],"firstnames":["Peter","N."],"suffixes":[]},{"propositions":[],"lastnames":["Mungall"],"firstnames":["Christopher","J."],"suffixes":[]}],"year":"2023","month":"December","number":"arXiv:2304.02711","eprint":"2304.02711","primaryclass":"cs","publisher":"arXiv","doi":"10.48550/arXiv.2304.02711","urldate":"2024-03-13","abstract":"Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.","archiveprefix":"arxiv","keywords":"Computer Science - Artificial Intelligence,Computer Science - Machine Learning","groups":"Ontologies and AI","timestamp":"2024-03-13T12:53:10Z","file":"caufieldStructuredPromptInterrogation2023.pdf:/home/upal/Zotero/storage/QEH3NQLR/caufieldStructuredPromptInterrogation2023.pdf:application/pdf;arXiv.org Snapshot:/home/upal/Zotero/storage/2JQ9BZ9M/2304.html:text/html","bibtex":"@misc{caufieldStructuredPromptInterrogation2023,\n title = {Structured Prompt Interrogation and Recursive Extraction of Semantics ({{SPIRES}}): {{A}} Method for Populating Knowledge Bases Using Zero-Shot Learning},\n shorttitle = {Structured Prompt Interrogation and Recursive Extraction of Semantics ({{SPIRES}})},\n author = {Caufield, J. Harry and Hegde, Harshad and Emonet, Vincent and Harris, Nomi L. and Joachimiak, Marcin P. and Matentzoglu, Nicolas and Kim, HyeongSik and Moxon, Sierra A. T. and Reese, Justin T. and Haendel, Melissa A. and Robinson, Peter N. and Mungall, Christopher J.},\n year = {2023},\n month = dec,\n number = {arXiv:2304.02711},\n eprint = {2304.02711},\n primaryclass = {cs},\n publisher = {arXiv},\n doi = {10.48550/arXiv.2304.02711},\n urldate = {2024-03-13},\n abstract = {Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.},\n archiveprefix = {arxiv},\n keywords = {Computer Science - Artificial Intelligence,Computer Science - Machine Learning},\n groups = {Ontologies and AI},\n timestamp = {2024-03-13T12:53:10Z},\n file = {caufieldStructuredPromptInterrogation2023.pdf:/home/upal/Zotero/storage/QEH3NQLR/caufieldStructuredPromptInterrogation2023.pdf:application/pdf;arXiv.org Snapshot:/home/upal/Zotero/storage/2JQ9BZ9M/2304.html:text/html}\n}\n\n","author_short":["Caufield, J. H.","Hegde, H.","Emonet, V.","Harris, N. L.","Joachimiak, M. P.","Matentzoglu, N.","Kim, H.","Moxon, S. A. T.","Reese, J. T.","Haendel, M. A.","Robinson, P. N.","Mungall, C. J."],"key":"caufieldStructuredPromptInterrogation2023","id":"caufieldStructuredPromptInterrogation2023","bibbaseid":"caufield-hegde-emonet-harris-joachimiak-matentzoglu-kim-moxon-etal-structuredpromptinterrogationandrecursiveextractionofsemanticsspiresamethodforpopulatingknowledgebasesusingzeroshotlearning-2023","role":"author","urls":{},"keyword":["Computer Science - Artificial Intelligence","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"https://bibbase.org/f/K3GtvNuXv42vfhMWB/PhD.bib","dataSources":["SWTccqEuRQ7gmvkN6","eC7Y8rLJCMj73AcuC","gM4JMEx6iyrWkZXYh"],"keywords":["computer science - artificial intelligence","computer science - machine learning"],"search_terms":["structured","prompt","interrogation","recursive","extraction","semantics","spires","method","populating","knowledge","bases","using","zero","shot","learning","caufield","hegde","emonet","harris","joachimiak","matentzoglu","kim","moxon","reese","haendel","robinson","mungall"],"title":"Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): A Method for Populating Knowledge Bases Using Zero-Shot Learning","year":2023}