PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science. Xu, J., Yu, C., Xu, J., Ding, Y., Torvik, V. I., Kang, J., Sung, M., & Song, M. October, 2024. arXiv:2410.07969Paper abstract bibtex Papers, patents, and clinical trials are indispensable types of scientific literature in biomedicine, crucial for knowledge sharing and dissemination. However, these documents are often stored in disparate databases with varying management standards and data formats, making it challenging to form systematic, fine-grained connections among them. To address this issue, we introduce PKG2.0, a comprehensive knowledge graph dataset encompassing over 36 million papers, 1.3 million patents, and 0.48 million clinical trials in the biomedical field. PKG2.0 integrates these previously dispersed resources through various links, including biomedical entities, author networks, citation relationships, and research projects. Fine-grained biomedical entity extraction, high-performance author name disambiguation, and multi-source citation integration have played a crucial role in the construction of the PKG dataset. Additionally, project data from the NIH Exporter enriches the dataset with metadata of NIH-funded projects and their scholarly outputs. Data validation demonstrates that PKG2.0 excels in key tasks such as author disambiguation and biomedical entity recognition. This dataset provides valuable resources for biomedical researchers, bibliometric scholars, and those engaged in literature mining.
@misc{xu_pubmed_2024,
title = {{PubMed} knowledge graph 2.0: {Connecting} papers, patents, and clinical trials in biomedical science},
shorttitle = {{PubMed} knowledge graph 2.0},
url = {http://arxiv.org/abs/2410.07969},
abstract = {Papers, patents, and clinical trials are indispensable types of scientific literature in biomedicine, crucial for knowledge sharing and dissemination. However, these documents are often stored in disparate databases with varying management standards and data formats, making it challenging to form systematic, fine-grained connections among them. To address this issue, we introduce PKG2.0, a comprehensive knowledge graph dataset encompassing over 36 million papers, 1.3 million patents, and 0.48 million clinical trials in the biomedical field. PKG2.0 integrates these previously dispersed resources through various links, including biomedical entities, author networks, citation relationships, and research projects. Fine-grained biomedical entity extraction, high-performance author name disambiguation, and multi-source citation integration have played a crucial role in the construction of the PKG dataset. Additionally, project data from the NIH Exporter enriches the dataset with metadata of NIH-funded projects and their scholarly outputs. Data validation demonstrates that PKG2.0 excels in key tasks such as author disambiguation and biomedical entity recognition. This dataset provides valuable resources for biomedical researchers, bibliometric scholars, and those engaged in literature mining.},
urldate = {2024-10-18},
publisher = {arXiv},
author = {Xu, Jian and Yu, Chao and Xu, Jiawei and Ding, Ying and Torvik, Vetle I. and Kang, Jaewoo and Sung, Mujeen and Song, Min},
month = oct,
year = {2024},
note = {arXiv:2410.07969},
keywords = {Computer Science - Digital Libraries},
}
Downloads: 0
{"_id":"pMSkTHPRjNNgZRkpu","bibbaseid":"xu-yu-xu-ding-torvik-kang-sung-song-pubmedknowledgegraph20connectingpaperspatentsandclinicaltrialsinbiomedicalscience-2024","author_short":["Xu, J.","Yu, C.","Xu, J.","Ding, Y.","Torvik, V. I.","Kang, J.","Sung, M.","Song, M."],"bibdata":{"bibtype":"misc","type":"misc","title":"PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science","shorttitle":"PubMed knowledge graph 2.0","url":"http://arxiv.org/abs/2410.07969","abstract":"Papers, patents, and clinical trials are indispensable types of scientific literature in biomedicine, crucial for knowledge sharing and dissemination. However, these documents are often stored in disparate databases with varying management standards and data formats, making it challenging to form systematic, fine-grained connections among them. To address this issue, we introduce PKG2.0, a comprehensive knowledge graph dataset encompassing over 36 million papers, 1.3 million patents, and 0.48 million clinical trials in the biomedical field. PKG2.0 integrates these previously dispersed resources through various links, including biomedical entities, author networks, citation relationships, and research projects. Fine-grained biomedical entity extraction, high-performance author name disambiguation, and multi-source citation integration have played a crucial role in the construction of the PKG dataset. Additionally, project data from the NIH Exporter enriches the dataset with metadata of NIH-funded projects and their scholarly outputs. Data validation demonstrates that PKG2.0 excels in key tasks such as author disambiguation and biomedical entity recognition. This dataset provides valuable resources for biomedical researchers, bibliometric scholars, and those engaged in literature mining.","urldate":"2024-10-18","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Xu"],"firstnames":["Jian"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Chao"],"suffixes":[]},{"propositions":[],"lastnames":["Xu"],"firstnames":["Jiawei"],"suffixes":[]},{"propositions":[],"lastnames":["Ding"],"firstnames":["Ying"],"suffixes":[]},{"propositions":[],"lastnames":["Torvik"],"firstnames":["Vetle","I."],"suffixes":[]},{"propositions":[],"lastnames":["Kang"],"firstnames":["Jaewoo"],"suffixes":[]},{"propositions":[],"lastnames":["Sung"],"firstnames":["Mujeen"],"suffixes":[]},{"propositions":[],"lastnames":["Song"],"firstnames":["Min"],"suffixes":[]}],"month":"October","year":"2024","note":"arXiv:2410.07969","keywords":"Computer Science - Digital Libraries","bibtex":"@misc{xu_pubmed_2024,\n\ttitle = {{PubMed} knowledge graph 2.0: {Connecting} papers, patents, and clinical trials in biomedical science},\n\tshorttitle = {{PubMed} knowledge graph 2.0},\n\turl = {http://arxiv.org/abs/2410.07969},\n\tabstract = {Papers, patents, and clinical trials are indispensable types of scientific literature in biomedicine, crucial for knowledge sharing and dissemination. However, these documents are often stored in disparate databases with varying management standards and data formats, making it challenging to form systematic, fine-grained connections among them. To address this issue, we introduce PKG2.0, a comprehensive knowledge graph dataset encompassing over 36 million papers, 1.3 million patents, and 0.48 million clinical trials in the biomedical field. PKG2.0 integrates these previously dispersed resources through various links, including biomedical entities, author networks, citation relationships, and research projects. Fine-grained biomedical entity extraction, high-performance author name disambiguation, and multi-source citation integration have played a crucial role in the construction of the PKG dataset. Additionally, project data from the NIH Exporter enriches the dataset with metadata of NIH-funded projects and their scholarly outputs. Data validation demonstrates that PKG2.0 excels in key tasks such as author disambiguation and biomedical entity recognition. This dataset provides valuable resources for biomedical researchers, bibliometric scholars, and those engaged in literature mining.},\n\turldate = {2024-10-18},\n\tpublisher = {arXiv},\n\tauthor = {Xu, Jian and Yu, Chao and Xu, Jiawei and Ding, Ying and Torvik, Vetle I. and Kang, Jaewoo and Sung, Mujeen and Song, Min},\n\tmonth = oct,\n\tyear = {2024},\n\tnote = {arXiv:2410.07969},\n\tkeywords = {Computer Science - Digital Libraries},\n}\n\n","author_short":["Xu, J.","Yu, C.","Xu, J.","Ding, Y.","Torvik, V. I.","Kang, J.","Sung, M.","Song, M."],"key":"xu_pubmed_2024","id":"xu_pubmed_2024","bibbaseid":"xu-yu-xu-ding-torvik-kang-sung-song-pubmedknowledgegraph20connectingpaperspatentsandclinicaltrialsinbiomedicalscience-2024","role":"author","urls":{"Paper":"http://arxiv.org/abs/2410.07969"},"keyword":["Computer Science - Digital Libraries"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://api.zotero.org/groups/4790165/items?key=qWYUkNg8G2tSrs1m5i7SsKOn&format=bibtex&limit=100","dataSources":["txmtuJDjhqHfaZE3C","wkZmECJAmJTTcjXCL","ttiB3rxTuWH3fiHv3"],"keywords":["computer science - digital libraries"],"search_terms":["pubmed","knowledge","graph","connecting","papers","patents","clinical","trials","biomedical","science","xu","yu","xu","ding","torvik","kang","sung","song"],"title":"PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science","year":2024}