Biomedical dataset recommendation. Wang, X., van Harmelen , F., & Huang, Z. In Quix, C., Hammoudi, S., & van der Aalst , W., editors, Proceedings of the 10th International Conference on Data Science, Technology and Applications, DATA 2021, pages 192–199, 2021. SciTePress. 10th International Conference on Data Science, Technology and Applications, DATA 2021 ; Conference date: 06-07-2021 Through 08-07-2021
doi  abstract   bibtex   2 downloads  
Copyright © 2021 by SCITEPRESS - Science and Technology Publications, Lda. All rights reservedDataset search is a special application of information retrieval, which aims to help scientists with finding the datasets they want. Current dataset search engines are query-driven, which implies that the results are limited by the ability of the user to formulate the appropriate query. In this paper we aim to solve this limitation by framing dataset search as a recommendation task: given a dataset by the user, the search engine recommends similar datasets. We solve this dataset recommendation task using a similarity approach. We provide a simple benchmark task to evaluate different approaches for this dataset recommendation task. We also evaluate the recommendation task with several similarity approaches in the biomedical domain. We benchmark 8 different similarity metrics between datasets, including both ontology-based techniques and techniques from machine learning. Our results show that the task of recommending scientific datasets based on meta-data as it occurs in realistic dataset collections is a hard task. None of the ontology-based methods manage to perform well on this task, and are outscored by the majority of the machine-learning methods. Of these ML methods only one of the approaches performs reasonably well, and even then only reaches 70% accuracy.
@inproceedings{xu2021dr,
title = "Biomedical dataset recommendation",
abstract = "Copyright {\textcopyright} 2021 by SCITEPRESS - Science and Technology Publications, Lda. All rights reservedDataset search is a special application of information retrieval, which aims to help scientists with finding the datasets they want. Current dataset search engines are query-driven, which implies that the results are limited by the ability of the user to formulate the appropriate query. In this paper we aim to solve this limitation by framing dataset search as a recommendation task: given a dataset by the user, the search engine recommends similar datasets. We solve this dataset recommendation task using a similarity approach. We provide a simple benchmark task to evaluate different approaches for this dataset recommendation task. We also evaluate the recommendation task with several similarity approaches in the biomedical domain. We benchmark 8 different similarity metrics between datasets, including both ontology-based techniques and techniques from machine learning. Our results show that the task of recommending scientific datasets based on meta-data as it occurs in realistic dataset collections is a hard task. None of the ontology-based methods manage to perform well on this task, and are outscored by the majority of the machine-learning methods. Of these ML methods only one of the approaches performs reasonably well, and even then only reaches 70\% accuracy.",
author = "X. Wang and {van Harmelen}, F. and Z. Huang",
year = "2021",
doi = "10.5220/0010521801920199",
language = "English",
pages = "192--199",
editor = "C. Quix and S. Hammoudi and {van der Aalst}, W.",
booktitle = "Proceedings of the 10th International Conference on Data Science, Technology and Applications, DATA 2021",
publisher = "SciTePress",
note = "10th International Conference on Data Science, Technology and Applications, DATA 2021 ; Conference date: 06-07-2021 Through 08-07-2021",
}

Downloads: 2