Subject metadata enrichment using statistical topic models

Subject metadata enrichment using statistical topic models. Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. In pages 366–375, 2007. ACM Press.

Paper doi abstract bibtex

Creating a collection of metadata records from disparate and diverse sources often results in uneven, unreliable and variable quality subject metadata. Having uniform, consistent and enriched subject metadata allows users to more easily discover material, browse the collection, and limit keyword search results by subject. We demonstrate how statistical topic models are useful for subject metadata enrichment. We describe some of the challenges of metadata enrichment on a huge scale (10 million metadata records from 700 repositories in the OAIster Digital Library) when the metadata is highly heterogeneous (metadata about images and text, and both cultural heritage material and scientific literature). We show how to improve the quality of the enriched metadata, using both manual and statistical modeling techniques. Finally, we discuss some of the challenges of the production environment, and demonstrate the value of the enriched metadata in a prototype portal.

@inproceedings{newman_subject_2007,
	title = {Subject metadata enrichment using statistical topic models},
	isbn = {978-1-59593-644-8},
	url = {http://dx.doi.org/10.1145/1255175.1255248},
	doi = {10.1145/1255175.1255248},
	abstract = {Creating a collection of metadata records from disparate and diverse sources often results in uneven, unreliable and variable quality subject metadata. Having uniform, consistent and enriched subject metadata allows users to more easily discover material, browse the collection, and limit keyword search results by subject. We demonstrate how statistical topic models are useful for subject metadata enrichment. We describe some of the challenges of metadata enrichment on a huge scale (10 million metadata records from 700 repositories in the OAIster Digital Library) when the metadata is highly heterogeneous (metadata about images and text, and both cultural heritage material and scientific literature). We show how to improve the quality of the enriched metadata, using both manual and statistical modeling techniques. Finally, we discuss some of the challenges of the production environment, and demonstrate the value of the enriched metadata in a prototype portal.},
	urldate = {2017-12-26TZ},
	publisher = {ACM Press},
	author = {Newman, David and Hagedorn, Kat and Chemudugunta, Chaitanya and Smyth, Padhraic},
	year = {2007},
	keywords = {clustering, digital-libraries, metadata--aggregation, oai-pmh, topic-modeling},
	pages = {366--375}
}

Downloads: 0

{"_id":"TMqCvK7W3TagxLYze","bibbaseid":"newman-hagedorn-chemudugunta-smyth-subjectmetadataenrichmentusingstatisticaltopicmodels-2007","downloads":0,"creationDate":"2018-10-19T17:50:31.789Z","title":"Subject metadata enrichment using statistical topic models","author_short":["Newman, D.","Hagedorn, K.","Chemudugunta, C.","Smyth, P."],"year":2007,"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero/gcordeiro","bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Subject metadata enrichment using statistical topic models","isbn":"978-1-59593-644-8","url":"http://dx.doi.org/10.1145/1255175.1255248","doi":"10.1145/1255175.1255248","abstract":"Creating a collection of metadata records from disparate and diverse sources often results in uneven, unreliable and variable quality subject metadata. Having uniform, consistent and enriched subject metadata allows users to more easily discover material, browse the collection, and limit keyword search results by subject. We demonstrate how statistical topic models are useful for subject metadata enrichment. We describe some of the challenges of metadata enrichment on a huge scale (10 million metadata records from 700 repositories in the OAIster Digital Library) when the metadata is highly heterogeneous (metadata about images and text, and both cultural heritage material and scientific literature). We show how to improve the quality of the enriched metadata, using both manual and statistical modeling techniques. Finally, we discuss some of the challenges of the production environment, and demonstrate the value of the enriched metadata in a prototype portal.","urldate":"2017-12-26TZ","publisher":"ACM Press","author":[{"propositions":[],"lastnames":["Newman"],"firstnames":["David"],"suffixes":[]},{"propositions":[],"lastnames":["Hagedorn"],"firstnames":["Kat"],"suffixes":[]},{"propositions":[],"lastnames":["Chemudugunta"],"firstnames":["Chaitanya"],"suffixes":[]},{"propositions":[],"lastnames":["Smyth"],"firstnames":["Padhraic"],"suffixes":[]}],"year":"2007","keywords":"clustering, digital-libraries, metadata–aggregation, oai-pmh, topic-modeling","pages":"366–375","bibtex":"@inproceedings{newman_subject_2007,\n\ttitle = {Subject metadata enrichment using statistical topic models},\n\tisbn = {978-1-59593-644-8},\n\turl = {http://dx.doi.org/10.1145/1255175.1255248},\n\tdoi = {10.1145/1255175.1255248},\n\tabstract = {Creating a collection of metadata records from disparate and diverse sources often results in uneven, unreliable and variable quality subject metadata. Having uniform, consistent and enriched subject metadata allows users to more easily discover material, browse the collection, and limit keyword search results by subject. We demonstrate how statistical topic models are useful for subject metadata enrichment. We describe some of the challenges of metadata enrichment on a huge scale (10 million metadata records from 700 repositories in the OAIster Digital Library) when the metadata is highly heterogeneous (metadata about images and text, and both cultural heritage material and scientific literature). We show how to improve the quality of the enriched metadata, using both manual and statistical modeling techniques. Finally, we discuss some of the challenges of the production environment, and demonstrate the value of the enriched metadata in a prototype portal.},\n\turldate = {2017-12-26TZ},\n\tpublisher = {ACM Press},\n\tauthor = {Newman, David and Hagedorn, Kat and Chemudugunta, Chaitanya and Smyth, Padhraic},\n\tyear = {2007},\n\tkeywords = {clustering, digital-libraries, metadata--aggregation, oai-pmh, topic-modeling},\n\tpages = {366--375}\n}\n\n","author_short":["Newman, D.","Hagedorn, K.","Chemudugunta, C.","Smyth, P."],"key":"newman_subject_2007","id":"newman_subject_2007","bibbaseid":"newman-hagedorn-chemudugunta-smyth-subjectmetadataenrichmentusingstatisticaltopicmodels-2007","role":"author","urls":{"Paper":"http://dx.doi.org/10.1145/1255175.1255248"},"keyword":["clustering","digital-libraries","metadata–aggregation","oai-pmh","topic-modeling"],"downloads":0},"search_terms":["subject","metadata","enrichment","using","statistical","topic","models","newman","hagedorn","chemudugunta","smyth"],"keywords":["clustering","digital-libraries","metadata–aggregation","oai-pmh","topic-modeling"],"authorIDs":[],"dataSources":["RfNNZcJnzabazX9bu"]}