Methodologies to evaluate recommender systems. Michiels, L. Ph.D. Thesis, University of Antwerp, Antwerp, 2024.
Methodologies to evaluate recommender systems [link]Paper  doi  abstract   bibtex   
In the current digital landscape, recommender systems play a pivotal role in shaping users' online experiences by providing personalized recommendations for relevant products, news articles, media content, and more. Their pervasive use makes the thorough evaluation of these systems of paramount importance. This dissertation addresses two key challenges in the evaluation of recommender systems. Part II of the dissertation focuses on improving methodologies for offline evaluation. Offline evaluation is a prevalent method for assessing recommendation algorithms in both academia and industry. Despite its widespread use, offline evaluations often suffer from methodological flaws that undermine their validity and real-world impact. This dissertation makes three key contributions to improving the reliability, internal and ecological validity, replicability, reproducibility, and reusability of offline evaluations. First, it presents an extensive review of the current state of practice and knowledge in offline evaluation, proposing a comprehensive set of better practices to address the reliability, replicability, and validity of offline evaluations. Next, it introduces RecPack, an open-source experimentation toolkit designed to facilitate reliable, reproducible, and reusable offline evaluations. Finally, it presents RecPack Tests, a test suite designed to ensure the correctness of recommendation algorithm implementations, thereby enhancing the reliability of offline evaluations. Part III of the dissertation examines the measurement of filter bubbles and serendipity. Both concepts have garnered significant attention due to concerns about the potential negative impacts of recommender systems on users of online platforms. One concern is that personalized content, especially on news and media platforms, may lock users into prior beliefs, contributing to increased polarization in society. Another concern is that exposure only to content previously expressed interest in may lead to boredom and eliminate surprise, preventing users from experiencing serendipity. This research makes three contributions to the study of filter bubbles and serendipity. First, it proposes an operational definition of technological filter bubbles, clarifying the ambiguity surrounding the concept. Second, it introduces a regression model for measuring their presence and strength in news recommendations, providing practitioners with the tools to rigorously study filter bubbles and gather real-world evidence of their (non-)existence. Finally, it proposes a feature repository for serendipity in recommender systems, offering a framework for evaluating how system design can influence users' experiences of serendipity in online information environments. In summary, the findings and tools developed in this dissertation advance the theoretical understanding of recommender system evaluation while offering practical tools for industry practitioners and researchers.
@phdthesis{michiels_methodologies_2024,
	address = {Antwerp},
	title = {Methodologies to evaluate recommender systems},
	url = {https://hdl.handle.net/10067/2080040151162165141},
	abstract = {In the current digital landscape, recommender systems play a pivotal role in shaping users' online experiences by providing personalized recommendations for relevant products, news articles, media content, and more. Their pervasive use makes the thorough evaluation of these systems of paramount importance. This dissertation addresses two key challenges in the evaluation of recommender systems. Part II of the dissertation focuses on improving methodologies for offline evaluation. Offline evaluation is a prevalent method for assessing recommendation algorithms in both academia and industry. Despite its widespread use, offline evaluations often suffer from methodological flaws that undermine their validity and real-world impact. This dissertation makes three key contributions to improving the reliability, internal and ecological validity, replicability, reproducibility, and reusability of offline evaluations. First, it presents an extensive review of the current state of practice and knowledge in offline evaluation, proposing a comprehensive set of better practices to address the reliability, replicability, and validity of offline evaluations. Next, it introduces RecPack, an open-source experimentation toolkit designed to facilitate reliable, reproducible, and reusable offline evaluations. Finally, it presents RecPack Tests, a test suite designed to ensure the correctness of recommendation algorithm implementations, thereby enhancing the reliability of offline evaluations. Part III of the dissertation examines the measurement of filter bubbles and serendipity. Both concepts have garnered significant attention due to concerns about the potential negative impacts of recommender systems on users of online platforms. One concern is that personalized content, especially on news and media platforms, may lock users into prior beliefs, contributing to increased polarization in society. Another concern is that exposure only to content previously expressed interest in may lead to boredom and eliminate surprise, preventing users from experiencing serendipity. This research makes three contributions to the study of filter bubbles and serendipity. First, it proposes an operational definition of technological filter bubbles, clarifying the ambiguity surrounding the concept. Second, it introduces a regression model for measuring their presence and strength in news recommendations, providing practitioners with the tools to rigorously study filter bubbles and gather real-world evidence of their (non-)existence. Finally, it proposes a feature repository for serendipity in recommender systems, offering a framework for evaluating how system design can influence users' experiences of serendipity in online information environments. In summary, the findings and tools developed in this dissertation advance the theoretical understanding of recommender system evaluation while offering practical tools for industry practitioners and researchers.},
	language = {en},
	urldate = {2024-09-25},
	school = {University of Antwerp},
	author = {Michiels, Lien},
	year = {2024},
	doi = {10.63028/10067/2080040151162165141},
}

Downloads: 0