The effect of random seeds for data splitting on recommendation accuracy

The effect of random seeds for data splitting on recommendation accuracy. Wegmeth, L., Vente, T., Purucker, L., & Beel, J. In Perspectives on the Evaluation of Recommender Systems Workshop (PERSPECTIVES 2023), September, 2023.
abstract bibtex

The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values 𝑘, and compare the results. In the extreme case with 𝑘 = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for 𝑘 = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of 𝑘 for both holdout and cross-validation.

@inproceedings{wegmeth_effect_2023,
	title = {The effect of random seeds for data splitting on recommendation accuracy},
	abstract = {The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values 𝑘, and compare the results. In the extreme case with 𝑘 = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3\% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for 𝑘 = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3\% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of 𝑘 for both holdout and cross-validation.},
	language = {en},
	booktitle = {Perspectives on the {Evaluation} of {Recommender} {Systems} {Workshop} ({PERSPECTIVES} 2023)},
	author = {Wegmeth, Lukas and Vente, Tobias and Purucker, Lennart and Beel, Joeran},
	month = sep,
	year = {2023},
	keywords = {to-read},
}

Downloads: 0

{"_id":"XnsMLBftYYdAn6ivq","bibbaseid":"wegmeth-vente-purucker-beel-theeffectofrandomseedsfordatasplittingonrecommendationaccuracy-2023","author_short":["Wegmeth, L.","Vente, T.","Purucker, L.","Beel, J."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"The effect of random seeds for data splitting on recommendation accuracy","abstract":"The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values 𝑘, and compare the results. In the extreme case with 𝑘 = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for 𝑘 = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of 𝑘 for both holdout and cross-validation.","language":"en","booktitle":"Perspectives on the Evaluation of Recommender Systems Workshop (PERSPECTIVES 2023)","author":[{"propositions":[],"lastnames":["Wegmeth"],"firstnames":["Lukas"],"suffixes":[]},{"propositions":[],"lastnames":["Vente"],"firstnames":["Tobias"],"suffixes":[]},{"propositions":[],"lastnames":["Purucker"],"firstnames":["Lennart"],"suffixes":[]},{"propositions":[],"lastnames":["Beel"],"firstnames":["Joeran"],"suffixes":[]}],"month":"September","year":"2023","keywords":"to-read","bibtex":"@inproceedings{wegmeth_effect_2023,\n\ttitle = {The effect of random seeds for data splitting on recommendation accuracy},\n\tabstract = {The evaluation of recommender system algorithms depends on randomness, e.g., during randomly splitting data into training and testing data. We suspect that failing to account for randomness in this scenario may lead to misrepresenting the predictive accuracy of recommendation algorithms. To understand the community’s view of the importance of randomness, we conducted a paper study on 39 full papers published at the ACM RecSys 2022 conference. We found that the authors of 26 papers used some variation of a holdout split that requires a random seed. However, only five papers explicitly repeated experiments and averaged their results over different random seeds. This potentially problematic research practice motivated us to analyze the effect of data split random seeds on recommendation accuracy. Therefore, we train three common algorithms on nine public data sets with 20 data split random seeds, evaluate them on two ranking metrics with three different ranking cutoff values 𝑘, and compare the results. In the extreme case with 𝑘 = 1, we show that depending on the data split random seed, the accuracy with traditional recommendation algorithms deviates by up to ∼6.3\\% from the mean accuracy achieved on the data set. Hence, we show that an algorithm may significantly over- or under-perform when maliciously or negligently selecting a random seed for splitting the data. To showcase a mitigation strategy and better research practice, we compare holdout to cross-validation and show that, again, for 𝑘 = 1, the accuracy of algorithms evaluated with cross-validation deviates only up to ∼2.3\\% from the mean accuracy achieved on the data set. Furthermore, we found that the deviation becomes smaller the higher the value of 𝑘 for both holdout and cross-validation.},\n\tlanguage = {en},\n\tbooktitle = {Perspectives on the {Evaluation} of {Recommender} {Systems} {Workshop} ({PERSPECTIVES} 2023)},\n\tauthor = {Wegmeth, Lukas and Vente, Tobias and Purucker, Lennart and Beel, Joeran},\n\tmonth = sep,\n\tyear = {2023},\n\tkeywords = {to-read},\n}\n\n","author_short":["Wegmeth, L.","Vente, T.","Purucker, L.","Beel, J."],"key":"wegmeth_effect_2023","id":"wegmeth_effect_2023","bibbaseid":"wegmeth-vente-purucker-beel-theeffectofrandomseedsfordatasplittingonrecommendationaccuracy-2023","role":"author","urls":{},"keyword":["to-read"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://api.zotero.org/users/6655/collections/3TB3KT36/items?key=VFvZhZXIoHNBbzoLZ1IM2zgf&format=bibtex&limit=100","dataSources":["7KNAjxiv2tsagmbgY"],"keywords":["to-read"],"search_terms":["effect","random","seeds","data","splitting","recommendation","accuracy","wegmeth","vente","purucker","beel"],"title":"The effect of random seeds for data splitting on recommendation accuracy","year":2023}