MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven. Velazquez Rodriguez, C. & De Roover, C. In Proceedings of the 20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM2020), of Proceedings - 20th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2020, pages 243–247, 9, 2020. IEEE. doi abstract bibtex Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting. This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.
@inproceedings{464c8ac5b52a4038a6cb7b556a9150a7,
title = "MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven",
abstract = "Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting. This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.",
keywords = "multi-label classification, libraries, software ecosystems, machine learning, software engineering",
author = "{Velazquez Rodriguez}, {Camilo Ernesto} and {De Roover}, Coen",
year = "2020",
month = "9",
doi = "10.1109/SCAM51674.2020.00034",
language = "English",
isbn = "978-1-7281-9248-2",
series = "Proceedings - 20th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2020",
publisher = "IEEE",
pages = "243--247",
booktitle = "Proceedings of the 20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM2020)",
}
Downloads: 0
{"_id":"NmezJ7AARn5Tr4tf8","bibbaseid":"velazquezrodriguez-deroover-mutamaanautomatedmultilabeltaggingapproachforsoftwarelibrariesonmaven-2020","author_short":["Velazquez Rodriguez, C.","De Roover, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven","abstract":"Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting. This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.","keywords":"multi-label classification, libraries, software ecosystems, machine learning, software engineering","author":[{"propositions":[],"lastnames":["Velazquez Rodriguez"],"firstnames":["Camilo Ernesto"],"suffixes":[]},{"propositions":[],"lastnames":["De Roover"],"firstnames":["Coen"],"suffixes":[]}],"year":"2020","month":"9","doi":"10.1109/SCAM51674.2020.00034","language":"English","isbn":"978-1-7281-9248-2","series":"Proceedings - 20th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2020","publisher":"IEEE","pages":"243–247","booktitle":"Proceedings of the 20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM2020)","bibtex":"@inproceedings{464c8ac5b52a4038a6cb7b556a9150a7,\n title = \"MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven\",\n abstract = \"Recent studies show that the Maven ecosystem alone already contains over 2 million library artefacts including their source code, byte code, and documentation. To help developers cope with this information, several websites overlay configurable views on the ecosystem. For instance, views in which similar libraries are grouped into categories or views showing all libraries that have been tagged with tags corresponding to coarse-grained library features. The MVNRepository overlay website offers both category-based and tag-based views. Unfortunately, several libraries have not been categorised or are missing relevant tags. Some initial approaches to the automated categorisation of Maven libraries have already been proposed. However, no such approach exists for the problem of tagging of libraries in a multi-label setting. This paper proposes MUTAMA, a multi-label classification approach to the Maven library tagging problem based on information extracted from the byte code of each library. We analysed 4088 randomly selected libraries from the Maven software ecosystem. MUTAMA trains and deploys five multi-label classifiers using feature vectors obtained from class and method names of the tagged libraries. Our results indicate that classifiers based on ensemble methods achieve the best performances. Finally, we propose directions to follow in this area.\",\n keywords = \"multi-label classification, libraries, software ecosystems, machine learning, software engineering\",\n author = \"{Velazquez Rodriguez}, {Camilo Ernesto} and {De Roover}, Coen\",\n year = \"2020\",\n month = \"9\",\n doi = \"10.1109/SCAM51674.2020.00034\",\n language = \"English\",\n isbn = \"978-1-7281-9248-2\",\n series = \"Proceedings - 20th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2020\",\n publisher = \"IEEE\",\n pages = \"243--247\",\n booktitle = \"Proceedings of the 20th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM2020)\",\n}","author_short":["Velazquez Rodriguez, C.","De Roover, C."],"key":"464c8ac5b52a4038a6cb7b556a9150a7","id":"464c8ac5b52a4038a6cb7b556a9150a7","bibbaseid":"velazquezrodriguez-deroover-mutamaanautomatedmultilabeltaggingapproachforsoftwarelibrariesonmaven-2020","role":"author","urls":{},"keyword":["multi-label classification","libraries","software ecosystems","machine learning","software engineering"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"http://soft.vub.ac.be/~cderoove/works.bib","dataSources":["abyQtmN3vDJoFXPPD"],"keywords":["multi-label classification","libraries","software ecosystems","machine learning","software engineering"],"search_terms":["mutama","automated","multi","label","tagging","approach","software","libraries","maven","velazquez rodriguez","de roover"],"title":"MUTAMA: An Automated Multi-label Tagging Approach for Software Libraries on Maven","year":2020}