Archival description and linked data: a preliminary study of opportunities and implementation challenges

Archival description and linked data: a preliminary study of opportunities and implementation challenges. Cazzanelli, M., Castillo, M. M., Soria-Barreto, M., Ochoa-Gaona, S., Sepúlveda-Lozada, A., Patiño-Espinosa, S. G., Jiménez-Pérez, N. C., Rodiles-Hernández, R., & Gracy, K. F. Archival Science, 15(3):239–294, September, 2015. 🏷️ /unread

Paper doi abstract bibtex

This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and schema.org; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates. 【摘要翻译】本文介绍了一项研究的结果，该研究旨在探讨档案馆如何通过使用语义网技术，特别是关联数据，将其馆藏与相关数据源连接起来。探讨的问题包括：（a）如果将编码档案描述（EAD）查找辅助工具和机器可读编目（MARC）记录转换为关联数据，那么目前在档案代理中可用的哪些类型的数据可能是有用的？(b) 对于那些在档案代用物中发现的潜在有用数据点，如何将这些代用物中发现的数据结构与其他相关的内部或外部信息源的数据结构相协调？(c) 为了实现不同数据源之间的互操作性，当前标准和数据结构的哪些特点构成了必须克服的障碍或挑战？为了回答这些问题，研究者确定了在档案代理中可能作为关联数据使用的元数据元素，以及可以作为通向相关外部数据源的数据集的元数据元素集和词汇表。研究选择的数据集包括DBpedia和schema.org；研究的元数据元素集包括朋友的朋友（FOAF）、地理名称（GeoNames）和事件的链接开放描述（LODE）。然后，研究者将EAD编码标准中的标签与这些关联数据源和元数据元素集中的相关类和属性进行了对齐。为了研究第三个问题，即在档案描述中整合关联数据的障碍，研究者使用手工计数和自然语言处理（NLP）工具OpenCalais，分析了受控和非受控访问点（个人和家庭姓名、公司名称、地理名称和类型/形式实体）出现在MARC和EAD档案描述记录样本中的位置和频率。位置和频率分析的结果，结合对齐过程的结果，帮助研究者发现了目前阻碍档案信息系统和相关关联数据源之间互操作性的几个关键挑战，包括档案和其他数据源词汇表之间的粒度差异，以及当前编码标准不足以支持嵌入档案代用物自由文本区域的潜在访问点的语义标记。

@article{cazzanelli2015,
	title = {Archival description and linked data: a preliminary study of opportunities and implementation challenges},
	volume = {15},
	issn = {1389-0166},
	shorttitle = {档案描述和关联数据：机遇与实施挑战的初步研究},
	url = {http://link.springer.com/10.1007/s10502-014-9216-2},
	doi = {10.1007/s10502-014-9216-2},
	abstract = {This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and schema.org; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates.

【摘要翻译】本文介绍了一项研究的结果，该研究旨在探讨档案馆如何通过使用语义网技术，特别是关联数据，将其馆藏与相关数据源连接起来。探讨的问题包括：（a）如果将编码档案描述（EAD）查找辅助工具和机器可读编目（MARC）记录转换为关联数据，那么目前在档案代理中可用的哪些类型的数据可能是有用的？(b) 对于那些在档案代用物中发现的潜在有用数据点，如何将这些代用物中发现的数据结 构与其他相关的内部或外部信息源的数据结构相协调？(c) 为了实现不同数据源之间的互操作性，当前标准和数据结构的哪些特点构成了必 须克服的障碍或挑战？为了回答这些问题，研究者确定了在档案代理中可能作为关联数据使用的元数据元素，以及可以 作为通向相关外部数据源的数据集的元数据元素集和词汇表。研究选择的数据集包括DBpedia和schema.org；研究的元数据元素集包括朋友的朋友（FOAF）、地理名称（GeoNames）和事件的链接开放描述（LODE）。然后，研究者将EAD编码标准中的标签与这些关联数据源和元数据元素集中的相关类和属性进行了对齐。为了研究第三个问题，即在档案描述中整合关联数据的障碍，研究者使用手工计数和自然语言处 理（NLP）工具OpenCalais，分析了受控和非受控访问点（个人和家庭姓名、公司名称、地理名 称和类型/形式实体）出现在MARC和EAD档案描述记录样本中的位置和频率。位置和频率分析的结果，结合对齐过程的结果，帮助研究者发现了目前阻碍档案信息系统和相关关联数据源之间互操作性的几个关键挑战，包括档案和其他数据源词汇表之间的粒度差异，以及当前编码标准不足以支持嵌入档案代用物自由文本区域的潜在访问点的语义标记。},
	language = {en},
	number = {3},
	urldate = {2020-12-16},
	journal = {Archival Science},
	author = {Cazzanelli, Matteo and Castillo, María Mercedes and Soria-Barreto, Miriam and Ochoa-Gaona, Susana and Sepúlveda-Lozada, Alejandra and Patiño-Espinosa, Sandra Gisele and Jiménez-Pérez, Nelly C. and Rodiles-Hernández, Rocío and Gracy, Karen F.},
	month = sep,
	year = {2015},
	note = {🏷️ /unread},
	keywords = {/unread},
	pages = {239--294},
}

Downloads: 0

{"_id":"yuq8mxuoo9cWq62kL","bibbaseid":"cazzanelli-castillo-soriabarreto-ochoagaona-seplvedalozada-patioespinosa-jimnezprez-rodileshernndez-etal-archivaldescriptionandlinkeddataapreliminarystudyofopportunitiesandimplementationchallenges-2015","author_short":["Cazzanelli, M.","Castillo, M. M.","Soria-Barreto, M.","Ochoa-Gaona, S.","Sepúlveda-Lozada, A.","Patiño-Espinosa, S. G.","Jiménez-Pérez, N. C.","Rodiles-Hernández, R.","Gracy, K. F."],"bibdata":{"bibtype":"article","type":"article","title":"Archival description and linked data: a preliminary study of opportunities and implementation challenges","volume":"15","issn":"1389-0166","shorttitle":"档案描述和关联数据：机遇与实施挑战的初步研究","url":"http://link.springer.com/10.1007/s10502-014-9216-2","doi":"10.1007/s10502-014-9216-2","abstract":"This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and schema.org; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates. 【摘要翻译】本文介绍了一项研究的结果，该研究旨在探讨档案馆如何通过使用语义网技术，特别是关联数据，将其馆藏与相关数据源连接起来。探讨的问题包括：（a）如果将编码档案描述（EAD）查找辅助工具和机器可读编目（MARC）记录转换为关联数据，那么目前在档案代理中可用的哪些类型的数据可能是有用的？(b) 对于那些在档案代用物中发现的潜在有用数据点，如何将这些代用物中发现的数据结构与其他相关的内部或外部信息源的数据结构相协调？(c) 为了实现不同数据源之间的互操作性，当前标准和数据结构的哪些特点构成了必须克服的障碍或挑战？为了回答这些问题，研究者确定了在档案代理中可能作为关联数据使用的元数据元素，以及可以作为通向相关外部数据源的数据集的元数据元素集和词汇表。研究选择的数据集包括DBpedia和schema.org；研究的元数据元素集包括朋友的朋友（FOAF）、地理名称（GeoNames）和事件的链接开放描述（LODE）。然后，研究者将EAD编码标准中的标签与这些关联数据源和元数据元素集中的相关类和属性进行了对齐。为了研究第三个问题，即在档案描述中整合关联数据的障碍，研究者使用手工计数和自然语言处理（NLP）工具OpenCalais，分析了受控和非受控访问点（个人和家庭姓名、公司名称、地理名称和类型/形式实体）出现在MARC和EAD档案描述记录样本中的位置和频率。位置和频率分析的结果，结合对齐过程的结果，帮助研究者发现了目前阻碍档案信息系统和相关关联数据源之间互操作性的几个关键挑战，包括档案和其他数据源词汇表之间的粒度差异，以及当前编码标准不足以支持嵌入档案代用物自由文本区域的潜在访问点的语义标记。","language":"en","number":"3","urldate":"2020-12-16","journal":"Archival Science","author":[{"propositions":[],"lastnames":["Cazzanelli"],"firstnames":["Matteo"],"suffixes":[]},{"propositions":[],"lastnames":["Castillo"],"firstnames":["María","Mercedes"],"suffixes":[]},{"propositions":[],"lastnames":["Soria-Barreto"],"firstnames":["Miriam"],"suffixes":[]},{"propositions":[],"lastnames":["Ochoa-Gaona"],"firstnames":["Susana"],"suffixes":[]},{"propositions":[],"lastnames":["Sepúlveda-Lozada"],"firstnames":["Alejandra"],"suffixes":[]},{"propositions":[],"lastnames":["Patiño-Espinosa"],"firstnames":["Sandra","Gisele"],"suffixes":[]},{"propositions":[],"lastnames":["Jiménez-Pérez"],"firstnames":["Nelly","C."],"suffixes":[]},{"propositions":[],"lastnames":["Rodiles-Hernández"],"firstnames":["Rocío"],"suffixes":[]},{"propositions":[],"lastnames":["Gracy"],"firstnames":["Karen","F."],"suffixes":[]}],"month":"September","year":"2015","note":"🏷️ /unread","keywords":"/unread","pages":"239–294","bibtex":"@article{cazzanelli2015,\n\ttitle = {Archival description and linked data: a preliminary study of opportunities and implementation challenges},\n\tvolume = {15},\n\tissn = {1389-0166},\n\tshorttitle = {档案描述和关联数据：机遇与实施挑战的初步研究},\n\turl = {http://link.springer.com/10.1007/s10502-014-9216-2},\n\tdoi = {10.1007/s10502-014-9216-2},\n\tabstract = {This paper presents the results of a study to investigate how archives can connect their collections to related data sources through the use of Semantic Web technologies, specifically Linked Data. Questions explored included (a) What types of data currently available in archival surrogates such as Encoded Archival Description (EAD) finding aids and Machine-Readable Cataloging (MARC) records may be useful if converted to Linked Data? (b) For those potentially useful data points identified in archival surrogates, how might one align data structures found in those surrogates to the data structures of other relevant internal or external information sources? (c) What features of current standards and data structures present impediments or challenges that must be overcome in order to achieve interoperability among disparate data sources? To answer these questions, the researcher identified metadata elements of potential use as Linked Data in archival surrogates, as well as metadata element sets and vocabularies of data sets that could serve as pathways to relevant external data sources. Data sets chosen for the study included DBpedia and schema.org; metadata element sets examined included Friend of a Friend (FOAF), GeoNames, and Linking Open Description of Events (LODE). The researcher then aligned tags found in the EAD encoding standard to related classes and properties found in these Linked Data sources and metadata element sets. To investigate the third question about impediments to incorporating Linked Data in archival descriptions, the researcher analyzed the locations and frequencies at which controlled and uncontrolled access points (personal and family name, corporate name, geographic name, and genre/form entities) appeared in a sample of MARC and EAD archival descriptive records by using a combination of hand counts and the natural language processing (NLP) tool, OpenCalais. The results of the location and frequency analysis, combined with the results of the alignment process, helped the researcher identify several critical challenges currently impeding interoperability among archival information systems and relevant Linked Data sources, including differences in granularity between archival and other data source vocabularies, and inadequacies of current encoding standards to support semantic tagging of potential access points embedded in free text areas of archival surrogates.\n\n【摘要翻译】本文介绍了一项研究的结果，该研究旨在探讨档案馆如何通过使用语义网技术，特别是关联数据，将其馆藏与相关数据源连接起来。探讨的问题包括：（a）如果将编码档案描述（EAD）查找辅助工具和机器可读编目（MARC）记录转换为关联数据，那么目前在档案代理中可用的哪些类型的数据可能是有用的？(b) 对于那些在档案代用物中发现的潜在有用数据点，如何将这些代用物中发现的数据结构与其他相关的内部或外部信息源的数据结构相协调？(c) 为了实现不同数据源之间的互操作性，当前标准和数据结构的哪些特点构成了必须克服的障碍或挑战？为了回答这些问题，研究者确定了在档案代理中可能作为关联数据使用的元数据元素，以及可以作为通向相关外部数据源的数据集的元数据元素集和词汇表。研究选择的数据集包括DBpedia和schema.org；研究的元数据元素集包括朋友的朋友（FOAF）、地理名称（GeoNames）和事件的链接开放描述（LODE）。然后，研究者将EAD编码标准中的标签与这些关联数据源和元数据元素集中的相关类和属性进行了对齐。为了研究第三个问题，即在档案描述中整合关联数据的障碍，研究者使用手工计数和自然语言处理（NLP）工具OpenCalais，分析了受控和非受控访问点（个人和家庭姓名、公司名称、地理名称和类型/形式实体）出现在MARC和EAD档案描述记录样本中的位置和频率。位置和频率分析的结果，结合对齐过程的结果，帮助研究者发现了目前阻碍档案信息系统和相关关联数据源之间互操作性的几个关键挑战，包括档案和其他数据源词汇表之间的粒度差异，以及当前编码标准不足以支持嵌入档案代用物自由文本区域的潜在访问点的语义标记。},\n\tlanguage = {en},\n\tnumber = {3},\n\turldate = {2020-12-16},\n\tjournal = {Archival Science},\n\tauthor = {Cazzanelli, Matteo and Castillo, María Mercedes and Soria-Barreto, Miriam and Ochoa-Gaona, Susana and Sepúlveda-Lozada, Alejandra and Patiño-Espinosa, Sandra Gisele and Jiménez-Pérez, Nelly C. and Rodiles-Hernández, Rocío and Gracy, Karen F.},\n\tmonth = sep,\n\tyear = {2015},\n\tnote = {🏷️ /unread},\n\tkeywords = {/unread},\n\tpages = {239--294},\n}\n\n","author_short":["Cazzanelli, M.","Castillo, M. M.","Soria-Barreto, M.","Ochoa-Gaona, S.","Sepúlveda-Lozada, A.","Patiño-Espinosa, S. G.","Jiménez-Pérez, N. C.","Rodiles-Hernández, R.","Gracy, K. F."],"key":"cazzanelli2015","id":"cazzanelli2015","bibbaseid":"cazzanelli-castillo-soriabarreto-ochoagaona-seplvedalozada-patioespinosa-jimnezprez-rodileshernndez-etal-archivaldescriptionandlinkeddataapreliminarystudyofopportunitiesandimplementationchallenges-2015","role":"author","urls":{"Paper":"http://link.springer.com/10.1007/s10502-014-9216-2"},"keyword":["/unread"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2386895/collections/XX2NLPN2/items?format=bibtex&limit=100","dataSources":["k3QfbE45mGbcFcKRM","wPWgDzyxsGksjg6mb"],"keywords":["/unread"],"search_terms":["archival","description","linked","data","preliminary","study","opportunities","implementation","challenges","cazzanelli","castillo","soria-barreto","ochoa-gaona","sepúlveda-lozada","patiño-espinosa","jiménez-pérez","rodiles-hernández","gracy"],"title":"Archival description and linked data: a preliminary study of opportunities and implementation challenges","year":2015}