Optimal Dataset Size for Recommender Systems: Evaluating Algorithms' Performance via Downsampling. Arabzadeh, A. Master's thesis, University of Siegen, February, 2025. arXiv:2502.08845 [cs]
Optimal Dataset Size for Recommender Systems: Evaluating Algorithms' Performance via Downsampling [link]Paper  abstract   bibtex   
The analysis reveals that algorithm performance under different downsampling portions is influenced by factors such as dataset characteristics, algorithm complexity, and the specific downsampling configuration (scenario dependent). In particular, some algorithms, which generally showed lower absolute nDCG@10 scores compared to those that performed better, exhibited lower sensitivity to the amount of training data provided, demonstrating greater potential to achieve optimal efficiency in lower downsampling portions. For instance, on average, these algorithms retained ∼81% of their full-size performance when using only 50% of the training set. In certain configurations of the downsampling method, where the focus was on progressively involving more users while keeping the test set fixed in size, they even demonstrated higher nDCG@10 scores than when using the original full-size dataset. These findings underscore the feasibility of balancing sustainability and effectiveness, providing practical insights for designing energy-efficient recommender systems and advancing sustainable AI practices.
@mastersthesis{arabzadeh_optimal_2025,
	title = {Optimal {Dataset} {Size} for {Recommender} {Systems}: {Evaluating} {Algorithms}' {Performance} via {Downsampling}},
	shorttitle = {Optimal {Dataset} {Size} for {Recommender} {Systems}},
	url = {http://arxiv.org/abs/2502.08845},
	abstract = {The analysis reveals that algorithm performance under different downsampling portions is influenced by factors such as dataset characteristics, algorithm complexity, and the specific downsampling configuration (scenario dependent). In particular, some algorithms, which generally showed lower absolute nDCG@10 scores compared to those that performed better, exhibited lower sensitivity to the amount of training data provided, demonstrating greater potential to achieve optimal efficiency in lower downsampling portions. For instance, on average, these algorithms retained ∼81\% of their full-size performance when using only 50\% of the training set. In certain configurations of the downsampling method, where the focus was on progressively involving more users while keeping the test set fixed in size, they even demonstrated higher nDCG@10 scores than when using the original full-size dataset. These findings underscore the feasibility of balancing sustainability and effectiveness, providing practical insights for designing energy-efficient recommender systems and advancing sustainable AI practices.},
	language = {en},
	urldate = {2025-04-15},
	school = {University of Siegen},
	author = {Arabzadeh, Ardalan},
	month = feb,
	year = {2025},
	note = {arXiv:2502.08845 [cs]},
}

Downloads: 0