Active learning in recommender systems for predicting vulnerabilities in software

Active learning in recommender systems for predicting vulnerabilities in software. Stijger, E. Master's thesis, Utrecht University, Utrecht, NL, 2024. Accepted: 2024-01-06T00:01:00Z

Paper abstract bibtex

Due to a rapid advancement of digital technology and growing reliance on the internet, cybersecurity has become a paramount issue for individuals, organizations, and governments. To address this challenge, penetration testing has emerged as a critical tool to ensure the security of computer systems and networks. The reconnaissance phase of penetration testing plays a crucial role in identifying vulnerabilities in a system by gathering relevant information. Although various tools are available to automate this process, most of them are limited to identifying reported vulnerabilities, and they do not provide suggestions or predictions about vulnerabilities. Therefore, this research aims to investigate the application of recommender systems to predict common vulnerabilities during the reconnaissance phase. The main objective of this research is to investigate how active learning affects the performance of a recommender system to identify vulnerabilities in software products. Item-Based k-NN Collaborative Filtering, a recommender system, can improve the identification of potential vulnerabilities and the effectiveness of penetration testing by analyzing information from similar data points. This research involves a comprehensive data preprocessing phase, which utilizes data from the National Vulnerability Database (NVD). Several recommender systems are built using this data, which enables the prediction of potential vulnerabilities during the reconnaissance phase of penetration testing. The performances of these recommender systems are evaluated, and the topperforming recommender system implements active learning to enhance its performance. The findings of this research demonstrate that Item-Based k-NN Collaborative Filtering outperforms other recommender systems in terms of overall performance when it comes to identifying software vulnerabilities. Furthermore, when compared to Item-Based k-NN Collaborative Filtering prior to active learning or with active learning and a random sampling technique, Item-Based k-NN Collaborative Filtering with active learning incorporating a 4- or 10-batch sampling technique with 20 or 40 items added yields a statistically significant improvement in the precision score. This indicates that a greater proportion of the predicted vulnerabilities are correct. Item-Based k-NN Collaborative Filtering with active learning and a single-batch sampling strategy only results in a statistically significant improvement in precision, compared to Item-Based k-NN Collaborative Filtering prior active learning or with active learning and a random sampling technique, when 20 items are added instead of 40. Furthermore, only Item-Based k-NN Collaborative Filtering with a 10-batch sampling strategy adding 20 items demonstrated a statistically significant improvement in nDCG scores compared to Item-Based k-NN Collaborative Filtering prior to active learning. This implies a more accurate ranking of the vulnerabilities. However, this could potentially be a type I error. From these findings, it can be concluded that introducing active learning in Item-Based k-NN Collaborative Filtering, using the approaches outlined, leads to significant improvement in precision score but not necessarily in nDCG score. Considering this conclusion, it is advised to use Item-Based k-NN Collaborative Filtering with active learning to predict vulnerabilities in software products and enhance the reconnaissance phase of penetration testing. This can be achieved by incorporating a single-batch sampling technique with 20 items added or a 4- or 10-batch sampling technique with 20 or 40 added. The insights gained from this research can help individuals, organizations, and governments strengthen their cybersecurity defences and protect against potential cyber threats.

@mastersthesis{stijger_active_2024,
	address = {Utrecht, NL},
	title = {Active learning in recommender systems for predicting vulnerabilities in software},
	copyright = {CC-BY-NC-ND},
	url = {https://studenttheses.uu.nl/handle/20.500.12932/45783},
	abstract = {Due to a rapid advancement of digital technology and growing reliance on the internet, cybersecurity
has become a paramount issue for individuals, organizations, and governments. To address this
challenge, penetration testing has emerged as a critical tool to ensure the security of computer
systems and networks. The reconnaissance phase of penetration testing plays a crucial role in
identifying vulnerabilities in a system by gathering relevant information. Although various tools are
available to automate this process, most of them are limited to identifying reported vulnerabilities,
and they do not provide suggestions or predictions about vulnerabilities. Therefore, this research
aims to investigate the application of recommender systems to predict common vulnerabilities
during the reconnaissance phase. The main objective of this research is to investigate how active
learning affects the performance of a recommender system to identify vulnerabilities in software
products.
Item-Based k-NN Collaborative Filtering, a recommender system, can improve the identification of
potential vulnerabilities and the effectiveness of penetration testing by analyzing information from
similar data points. This research involves a comprehensive data preprocessing phase, which utilizes
data from the National Vulnerability Database (NVD). Several recommender systems are built using
this data, which enables the prediction of potential vulnerabilities during the reconnaissance phase
of penetration testing. The performances of these recommender systems are evaluated, and the topperforming recommender system implements active learning to enhance its performance.
The findings of this research demonstrate that Item-Based k-NN Collaborative Filtering outperforms
other recommender systems in terms of overall performance when it comes to identifying software
vulnerabilities. Furthermore, when compared to Item-Based k-NN Collaborative Filtering prior
to active learning or with active learning and a random sampling technique, Item-Based k-NN
Collaborative Filtering with active learning incorporating a 4- or 10-batch sampling technique with
20 or 40 items added yields a statistically significant improvement in the precision score. This
indicates that a greater proportion of the predicted vulnerabilities are correct. Item-Based k-NN
Collaborative Filtering with active learning and a single-batch sampling strategy only results in
a statistically significant improvement in precision, compared to Item-Based k-NN Collaborative
Filtering prior active learning or with active learning and a random sampling technique, when 20
items are added instead of 40.
Furthermore, only Item-Based k-NN Collaborative Filtering with a 10-batch sampling strategy
adding 20 items demonstrated a statistically significant improvement in nDCG scores compared to
Item-Based k-NN Collaborative Filtering prior to active learning. This implies a more accurate
ranking of the vulnerabilities. However, this could potentially be a type I error.
From these findings, it can be concluded that introducing active learning in Item-Based k-NN
Collaborative Filtering, using the approaches outlined, leads to significant improvement in precision
score but not necessarily in nDCG score.
Considering this conclusion, it is advised to use Item-Based k-NN Collaborative Filtering with
active learning to predict vulnerabilities in software products and enhance the reconnaissance phase
of penetration testing. This can be achieved by incorporating a single-batch sampling technique
with 20 items added or a 4- or 10-batch sampling technique with 20 or 40 added.
The insights gained from this research can help individuals, organizations, and governments strengthen
their cybersecurity defences and protect against potential cyber threats.},
	language = {EN},
	urldate = {2024-10-11},
	school = {Utrecht University},
	author = {Stijger, Elise},
	year = {2024},
	note = {Accepted: 2024-01-06T00:01:00Z},
}

Downloads: 0

{"_id":"SZzr4MGdMqtzN78JG","bibbaseid":"stijger-activelearninginrecommendersystemsforpredictingvulnerabilitiesinsoftware-2024","author_short":["Stijger, E."],"bibdata":{"bibtype":"mastersthesis","type":"mastersthesis","address":"Utrecht, NL","title":"Active learning in recommender systems for predicting vulnerabilities in software","copyright":"CC-BY-NC-ND","url":"https://studenttheses.uu.nl/handle/20.500.12932/45783","abstract":"Due to a rapid advancement of digital technology and growing reliance on the internet, cybersecurity has become a paramount issue for individuals, organizations, and governments. To address this challenge, penetration testing has emerged as a critical tool to ensure the security of computer systems and networks. The reconnaissance phase of penetration testing plays a crucial role in identifying vulnerabilities in a system by gathering relevant information. Although various tools are available to automate this process, most of them are limited to identifying reported vulnerabilities, and they do not provide suggestions or predictions about vulnerabilities. Therefore, this research aims to investigate the application of recommender systems to predict common vulnerabilities during the reconnaissance phase. The main objective of this research is to investigate how active learning affects the performance of a recommender system to identify vulnerabilities in software products. Item-Based k-NN Collaborative Filtering, a recommender system, can improve the identification of potential vulnerabilities and the effectiveness of penetration testing by analyzing information from similar data points. This research involves a comprehensive data preprocessing phase, which utilizes data from the National Vulnerability Database (NVD). Several recommender systems are built using this data, which enables the prediction of potential vulnerabilities during the reconnaissance phase of penetration testing. The performances of these recommender systems are evaluated, and the topperforming recommender system implements active learning to enhance its performance. The findings of this research demonstrate that Item-Based k-NN Collaborative Filtering outperforms other recommender systems in terms of overall performance when it comes to identifying software vulnerabilities. Furthermore, when compared to Item-Based k-NN Collaborative Filtering prior to active learning or with active learning and a random sampling technique, Item-Based k-NN Collaborative Filtering with active learning incorporating a 4- or 10-batch sampling technique with 20 or 40 items added yields a statistically significant improvement in the precision score. This indicates that a greater proportion of the predicted vulnerabilities are correct. Item-Based k-NN Collaborative Filtering with active learning and a single-batch sampling strategy only results in a statistically significant improvement in precision, compared to Item-Based k-NN Collaborative Filtering prior active learning or with active learning and a random sampling technique, when 20 items are added instead of 40. Furthermore, only Item-Based k-NN Collaborative Filtering with a 10-batch sampling strategy adding 20 items demonstrated a statistically significant improvement in nDCG scores compared to Item-Based k-NN Collaborative Filtering prior to active learning. This implies a more accurate ranking of the vulnerabilities. However, this could potentially be a type I error. From these findings, it can be concluded that introducing active learning in Item-Based k-NN Collaborative Filtering, using the approaches outlined, leads to significant improvement in precision score but not necessarily in nDCG score. Considering this conclusion, it is advised to use Item-Based k-NN Collaborative Filtering with active learning to predict vulnerabilities in software products and enhance the reconnaissance phase of penetration testing. This can be achieved by incorporating a single-batch sampling technique with 20 items added or a 4- or 10-batch sampling technique with 20 or 40 added. The insights gained from this research can help individuals, organizations, and governments strengthen their cybersecurity defences and protect against potential cyber threats.","language":"EN","urldate":"2024-10-11","school":"Utrecht University","author":[{"propositions":[],"lastnames":["Stijger"],"firstnames":["Elise"],"suffixes":[]}],"year":"2024","note":"Accepted: 2024-01-06T00:01:00Z","bibtex":"@mastersthesis{stijger_active_2024,\n\taddress = {Utrecht, NL},\n\ttitle = {Active learning in recommender systems for predicting vulnerabilities in software},\n\tcopyright = {CC-BY-NC-ND},\n\turl = {https://studenttheses.uu.nl/handle/20.500.12932/45783},\n\tabstract = {Due to a rapid advancement of digital technology and growing reliance on the internet, cybersecurity\nhas become a paramount issue for individuals, organizations, and governments. To address this\nchallenge, penetration testing has emerged as a critical tool to ensure the security of computer\nsystems and networks. The reconnaissance phase of penetration testing plays a crucial role in\nidentifying vulnerabilities in a system by gathering relevant information. Although various tools are\navailable to automate this process, most of them are limited to identifying reported vulnerabilities,\nand they do not provide suggestions or predictions about vulnerabilities. Therefore, this research\naims to investigate the application of recommender systems to predict common vulnerabilities\nduring the reconnaissance phase. The main objective of this research is to investigate how active\nlearning affects the performance of a recommender system to identify vulnerabilities in software\nproducts.\nItem-Based k-NN Collaborative Filtering, a recommender system, can improve the identification of\npotential vulnerabilities and the effectiveness of penetration testing by analyzing information from\nsimilar data points. This research involves a comprehensive data preprocessing phase, which utilizes\ndata from the National Vulnerability Database (NVD). Several recommender systems are built using\nthis data, which enables the prediction of potential vulnerabilities during the reconnaissance phase\nof penetration testing. The performances of these recommender systems are evaluated, and the topperforming recommender system implements active learning to enhance its performance.\nThe findings of this research demonstrate that Item-Based k-NN Collaborative Filtering outperforms\nother recommender systems in terms of overall performance when it comes to identifying software\nvulnerabilities. Furthermore, when compared to Item-Based k-NN Collaborative Filtering prior\nto active learning or with active learning and a random sampling technique, Item-Based k-NN\nCollaborative Filtering with active learning incorporating a 4- or 10-batch sampling technique with\n20 or 40 items added yields a statistically significant improvement in the precision score. This\nindicates that a greater proportion of the predicted vulnerabilities are correct. Item-Based k-NN\nCollaborative Filtering with active learning and a single-batch sampling strategy only results in\na statistically significant improvement in precision, compared to Item-Based k-NN Collaborative\nFiltering prior active learning or with active learning and a random sampling technique, when 20\nitems are added instead of 40.\nFurthermore, only Item-Based k-NN Collaborative Filtering with a 10-batch sampling strategy\nadding 20 items demonstrated a statistically significant improvement in nDCG scores compared to\nItem-Based k-NN Collaborative Filtering prior to active learning. This implies a more accurate\nranking of the vulnerabilities. However, this could potentially be a type I error.\nFrom these findings, it can be concluded that introducing active learning in Item-Based k-NN\nCollaborative Filtering, using the approaches outlined, leads to significant improvement in precision\nscore but not necessarily in nDCG score.\nConsidering this conclusion, it is advised to use Item-Based k-NN Collaborative Filtering with\nactive learning to predict vulnerabilities in software products and enhance the reconnaissance phase\nof penetration testing. This can be achieved by incorporating a single-batch sampling technique\nwith 20 items added or a 4- or 10-batch sampling technique with 20 or 40 added.\nThe insights gained from this research can help individuals, organizations, and governments strengthen\ntheir cybersecurity defences and protect against potential cyber threats.},\n\tlanguage = {EN},\n\turldate = {2024-10-11},\n\tschool = {Utrecht University},\n\tauthor = {Stijger, Elise},\n\tyear = {2024},\n\tnote = {Accepted: 2024-01-06T00:01:00Z},\n}\n\n","author_short":["Stijger, E."],"key":"stijger_active_2024","id":"stijger_active_2024","bibbaseid":"stijger-activelearninginrecommendersystemsforpredictingvulnerabilitiesinsoftware-2024","role":"author","urls":{"Paper":"https://studenttheses.uu.nl/handle/20.500.12932/45783"},"metadata":{"authorlinks":{}}},"bibtype":"mastersthesis","biburl":"https://api.zotero.org/users/6655/collections/3TB3KT36/items?key=VFvZhZXIoHNBbzoLZ1IM2zgf&format=bibtex&limit=100","dataSources":["7KNAjxiv2tsagmbgY"],"keywords":[],"search_terms":["active","learning","recommender","systems","predicting","vulnerabilities","software","stijger"],"title":"Active learning in recommender systems for predicting vulnerabilities in software","year":2024}