The Impact of Feature Quantity on Recommendation Algorithm Performance: A Movielens-100K Case Study

The Impact of Feature Quantity on Recommendation Algorithm Performance: A Movielens-100K Case Study. Wegmeth, L. July 2022. ISBN: 2207.08713 Publication Title: arXiv [cs.IR]

Paper abstract bibtex 4 downloads

Recent model-based Recommender Systems (RecSys) algorithms emphasize on the use of features, also called side information, in their design similar to algorithms in Machine Learning (ML). In contrast, some of the most popular and traditional algorithms for RecSys solely focus on a given user-item-rating relation without including side information. The goal of this case study is to provide a performance comparison and assessment of RecSys and ML algorithms when side information is included. We chose the Movielens-100K data set since it is a standard for comparing RecSys algorithms. We compared six different feature sets with varying quantities of features which were generated from the baseline data and evaluated on a total of 19 RecSys algorithms, baseline ML algorithms, Automated Machine Learning (AutoML) pipelines, and state-of-the-art RecSys algorithms that incorporate side information. The results show that additional features benefit all algorithms we evaluated. However, the correlation between feature quantity and performance is not monotonous for AutoML and RecSys. In these categories, an analysis of feature importance revealed that the quality of features matters more than quantity. Throughout our experiments, the average performance on the feature set with the lowest number of features is about 6% worse compared to that with the highest in terms of the Root Mean Squared Error. An interesting observation is that AutoML outperforms matrix factorization-based RecSys algorithms when additional features are used. Almost all algorithms that can include side information have higher performance when using the highest quantity of features. In the other cases, the performance difference is negligible (\textless1%). The results show a clear positive trend for the effect of feature quantity as well as the important effects of feature quality on the evaluated algorithms.

@unpublished{wegmeth_impact_2022,
	title = {The {Impact} of {Feature} {Quantity} on {Recommendation} {Algorithm} {Performance}: {A} {Movielens}-{100K} {Case} {Study}},
	url = {http://arxiv.org/abs/2207.08713},
	abstract = {Recent model-based Recommender Systems (RecSys) algorithms emphasize on
the use of features, also called side information, in their design similar
to algorithms in Machine Learning (ML). In contrast, some of the most
popular and traditional algorithms for RecSys solely focus on a given
user-item-rating relation without including side information. The goal of
this case study is to provide a performance comparison and assessment of
RecSys and ML algorithms when side information is included. We chose the
Movielens-100K data set since it is a standard for comparing RecSys
algorithms. We compared six different feature sets with varying quantities
of features which were generated from the baseline data and evaluated on a
total of 19 RecSys algorithms, baseline ML algorithms, Automated Machine
Learning (AutoML) pipelines, and state-of-the-art RecSys algorithms that
incorporate side information. The results show that additional features
benefit all algorithms we evaluated. However, the correlation between
feature quantity and performance is not monotonous for AutoML and RecSys.
In these categories, an analysis of feature importance revealed that the
quality of features matters more than quantity. Throughout our
experiments, the average performance on the feature set with the lowest
number of features is about 6\% worse compared to that with the highest in
terms of the Root Mean Squared Error. An interesting observation is that
AutoML outperforms matrix factorization-based RecSys algorithms when
additional features are used. Almost all algorithms that can include side
information have higher performance when using the highest quantity of
features. In the other cases, the performance difference is negligible
({\textless}1\%). The results show a clear positive trend for the effect of feature
quantity as well as the important effects of feature quality on the
evaluated algorithms.},
	author = {Wegmeth, Lukas},
	month = jul,
	year = {2022},
	note = {ISBN: 2207.08713
Publication Title: arXiv [cs.IR]},
}

Downloads: 4

{"_id":"BssTMSQp4wNGnRfue","bibbaseid":"wegmeth-theimpactoffeaturequantityonrecommendationalgorithmperformanceamovielens100kcasestudy-2022","author_short":["Wegmeth, L."],"bibdata":{"bibtype":"unpublished","type":"unpublished","title":"The Impact of Feature Quantity on Recommendation Algorithm Performance: A Movielens-100K Case Study","url":"http://arxiv.org/abs/2207.08713","abstract":"Recent model-based Recommender Systems (RecSys) algorithms emphasize on the use of features, also called side information, in their design similar to algorithms in Machine Learning (ML). In contrast, some of the most popular and traditional algorithms for RecSys solely focus on a given user-item-rating relation without including side information. The goal of this case study is to provide a performance comparison and assessment of RecSys and ML algorithms when side information is included. We chose the Movielens-100K data set since it is a standard for comparing RecSys algorithms. We compared six different feature sets with varying quantities of features which were generated from the baseline data and evaluated on a total of 19 RecSys algorithms, baseline ML algorithms, Automated Machine Learning (AutoML) pipelines, and state-of-the-art RecSys algorithms that incorporate side information. The results show that additional features benefit all algorithms we evaluated. However, the correlation between feature quantity and performance is not monotonous for AutoML and RecSys. In these categories, an analysis of feature importance revealed that the quality of features matters more than quantity. Throughout our experiments, the average performance on the feature set with the lowest number of features is about 6% worse compared to that with the highest in terms of the Root Mean Squared Error. An interesting observation is that AutoML outperforms matrix factorization-based RecSys algorithms when additional features are used. Almost all algorithms that can include side information have higher performance when using the highest quantity of features. In the other cases, the performance difference is negligible (\\textless1%). The results show a clear positive trend for the effect of feature quantity as well as the important effects of feature quality on the evaluated algorithms.","author":[{"propositions":[],"lastnames":["Wegmeth"],"firstnames":["Lukas"],"suffixes":[]}],"month":"July","year":"2022","note":"ISBN: 2207.08713 Publication Title: arXiv [cs.IR]","bibtex":"@unpublished{wegmeth_impact_2022,\n\ttitle = {The {Impact} of {Feature} {Quantity} on {Recommendation} {Algorithm} {Performance}: {A} {Movielens}-{100K} {Case} {Study}},\n\turl = {http://arxiv.org/abs/2207.08713},\n\tabstract = {Recent model-based Recommender Systems (RecSys) algorithms emphasize on\nthe use of features, also called side information, in their design similar\nto algorithms in Machine Learning (ML). In contrast, some of the most\npopular and traditional algorithms for RecSys solely focus on a given\nuser-item-rating relation without including side information. The goal of\nthis case study is to provide a performance comparison and assessment of\nRecSys and ML algorithms when side information is included. We chose the\nMovielens-100K data set since it is a standard for comparing RecSys\nalgorithms. We compared six different feature sets with varying quantities\nof features which were generated from the baseline data and evaluated on a\ntotal of 19 RecSys algorithms, baseline ML algorithms, Automated Machine\nLearning (AutoML) pipelines, and state-of-the-art RecSys algorithms that\nincorporate side information. The results show that additional features\nbenefit all algorithms we evaluated. However, the correlation between\nfeature quantity and performance is not monotonous for AutoML and RecSys.\nIn these categories, an analysis of feature importance revealed that the\nquality of features matters more than quantity. Throughout our\nexperiments, the average performance on the feature set with the lowest\nnumber of features is about 6\\% worse compared to that with the highest in\nterms of the Root Mean Squared Error. An interesting observation is that\nAutoML outperforms matrix factorization-based RecSys algorithms when\nadditional features are used. Almost all algorithms that can include side\ninformation have higher performance when using the highest quantity of\nfeatures. In the other cases, the performance difference is negligible\n({\\textless}1\\%). The results show a clear positive trend for the effect of feature\nquantity as well as the important effects of feature quality on the\nevaluated algorithms.},\n\tauthor = {Wegmeth, Lukas},\n\tmonth = jul,\n\tyear = {2022},\n\tnote = {ISBN: 2207.08713\nPublication Title: arXiv [cs.IR]},\n}\n\n","author_short":["Wegmeth, L."],"key":"wegmeth_impact_2022","id":"wegmeth_impact_2022","bibbaseid":"wegmeth-theimpactoffeaturequantityonrecommendationalgorithmperformanceamovielens100kcasestudy-2022","role":"author","urls":{"Paper":"http://arxiv.org/abs/2207.08713"},"metadata":{"authorlinks":{}},"downloads":4},"bibtype":"unpublished","biburl":"https://api.zotero.org/users/6655/collections/3TB3KT36/items?key=VFvZhZXIoHNBbzoLZ1IM2zgf&format=bibtex&limit=100","dataSources":["HB6fr7bPytW2CAAzC","ca4t6HZh8piBqYaYM"],"keywords":[],"search_terms":["impact","feature","quantity","recommendation","algorithm","performance","movielens","100k","case","study","wegmeth"],"title":"The Impact of Feature Quantity on Recommendation Algorithm Performance: A Movielens-100K Case Study","year":2022,"downloads":4}