Image Analytics in Web Archives. Müller-Budack, E., Pustu-Iren, K., Diering, S., Springstein, M., & Ewerth, R. In Gomes, D., Demidova, E., Winters, J., & Risse, T., editors, The Past Web: Exploring Web Archives, pages 141–151. Springer International Publishing, Cham, 2021.
Image Analytics in Web Archives [link]Paper  doi  abstract   bibtex   
The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.
@incollection{muller-budack_image_2021,
	address = {Cham},
	title = {Image {Analytics} in {Web} {Archives}},
	isbn = {978-3-030-63291-5},
	url = {https://doi.org/10.1007/978-3-030-63291-5_11},
	abstract = {The multimedia content published on the World Wide Web is constantly growing and contains valuable information in various domains. The Internet Archive initiative has gathered billions of time-versioned web pages since the mid-nineties, but unfortunately, they are rarely provided with appropriate metadata. This lack of structured data limits the exploration of the archives, and automated solutions are required to enable semantic search. While many approaches exploit the textual content of news in the Internet Archive to detect named entities and their relations, visual information is generally disregarded. In this chapter, we present an approach that leverages deep learning techniques for the identification of public personalities in the images of news articles stored in the Internet Archive. In addition, we elaborate on how this approach can be extended to enable detection of other entity types such as locations or events. The approach complements named entity recognition and linking tools for text and allows researchers and analysts to track the media coverage and relations of persons more precisely. We have analysed more than one million images from news articles in the Internet Archive and demonstrated the feasibility of the approach with two use cases in different domains: politics and entertainment.},
	language = {en},
	urldate = {2021-07-14},
	booktitle = {The {Past} {Web}: {Exploring} {Web} {Archives}},
	publisher = {Springer International Publishing},
	author = {Müller-Budack, Eric and Pustu-Iren, Kader and Diering, Sebastian and Springstein, Matthias and Ewerth, Ralph},
	editor = {Gomes, Daniel and Demidova, Elena and Winters, Jane and Risse, Thomas},
	year = {2021},
	doi = {10.1007/978-3-030-63291-5_11},
	pages = {141--151},
}

Downloads: 0