Towards Making Systems Forget with Machine Unlearning

Towards Making Systems Forget with Machine Unlearning. Cao, Y. & Yang, J. In May, 2015. IEEE.

Today’s systems produce a wealth of data every day, and the data further generates more data, i.e., the derived data, forming into a complex data propagation network, defined as the data’s lineage. There are many reasons for users and administrators to forget certain data including the data’s lineage. From the privacy perspective, a system may leak private information of certain users, and those users unhappy about privacy leaks naturally want to forget their data and its lineage. From the security perspective, an anomaly detection system can be polluted by adversaries through injecting manually crafted data into the training set. Therefore, we envision forgetting systems, capable of completely forgetting certain data and its lineage. In this paper, we focus on making learning systems forget, the process of which is defined as machine unlearning or unlearning. To perform unlearning upon learning system, we present general unlearning criteria, i.e., converting a learning system or part of it into a summation form of statistical query learning model, and updating all the summations to achieve unlearning. Then, we integrate our unlearning criteria into an unlearning architecture that interacts with all the components of a learning system, such as sample clustering and feature selection. To demonstrate our unlearning criteria and architecture, we select four real-world learning systems, including an item-item recommendation system, an online social network spam filter, and a malware detection system. These systems are first exposed to an adversarial environment, e.g., if the system is potentially vulnerable to training data pollution, we first pollute the training data set and show that the detection rate drops significantly. Then, we apply our unlearning technique upon those affected systems, either polluted or leaking private information. Our results show that after unlearning, the detection rate of a polluted system increases back to the one before pollution, and a system leaking a particular user’s private information completely forgets that information.

@inproceedings{cao_towards_2015,
	title = {Towards {Making} {Systems} {Forget} with {Machine} {Unlearning}},
	url = {http://www.ieee-security.org/TC/SP2015/papers-archived/6949a463.pdf},
	abstract = {Today’s systems produce a wealth of data every day, and the data further
generates more data, i.e., the derived data, forming into a complex data
propagation network, defined as the data’s lineage. There are many reasons
for users and administrators to forget certain data including the data’s
lineage. From the privacy perspective, a system may leak private
information of certain users, and those users unhappy about privacy leaks
naturally want to forget their data and its lineage. From the security
perspective, an anomaly detection system can be polluted by adversaries
through injecting manually crafted data into the training set. Therefore,
we envision forgetting systems, capable of completely forgetting certain
data and its lineage. In this paper, we focus on making learning systems
forget, the process of which is defined as machine unlearning or
unlearning. To perform unlearning upon learning system, we present general
unlearning criteria, i.e., converting a learning system or part of it into
a summation form of statistical query learning model, and updating all the
summations to achieve unlearning. Then, we integrate our unlearning
criteria into an unlearning architecture that interacts with all the
components of a learning system, such as sample clustering and feature
selection. To demonstrate our unlearning criteria and architecture, we
select four real-world learning systems, including an item-item
recommendation system, an online social network spam filter, and a malware
detection system. These systems are first exposed to an adversarial
environment, e.g., if the system is potentially vulnerable to training
data pollution, we first pollute the training data set and show that the
detection rate drops significantly. Then, we apply our unlearning
technique upon those affected systems, either polluted or leaking private
information. Our results show that after unlearning, the detection rate of
a polluted system increases back to the one before pollution, and a system
leaking a particular user’s private information completely forgets that
information.},
	publisher = {IEEE},
	author = {Cao, Yinzhi and Yang, Junfeng},
	month = may,
	year = {2015},
}

Downloads: 0

{"_id":"NZcPYufsDHNbF99H4","bibbaseid":"cao-yang-towardsmakingsystemsforgetwithmachineunlearning-2015","downloads":0,"creationDate":"2019-02-15T15:14:58.210Z","title":"Towards Making Systems Forget with Machine Unlearning","author_short":["Cao, Y.","Yang, J."],"year":2015,"bibtype":"inproceedings","biburl":"https://api.zotero.org/users/6655/collections/TJPPJ92X/items?key=VFvZhZXIoHNBbzoLZ1IM2zgf&format=bibtex&limit=100","bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Towards Making Systems Forget with Machine Unlearning","url":"http://www.ieee-security.org/TC/SP2015/papers-archived/6949a463.pdf","abstract":"Today’s systems produce a wealth of data every day, and the data further generates more data, i.e., the derived data, forming into a complex data propagation network, defined as the data’s lineage. There are many reasons for users and administrators to forget certain data including the data’s lineage. From the privacy perspective, a system may leak private information of certain users, and those users unhappy about privacy leaks naturally want to forget their data and its lineage. From the security perspective, an anomaly detection system can be polluted by adversaries through injecting manually crafted data into the training set. Therefore, we envision forgetting systems, capable of completely forgetting certain data and its lineage. In this paper, we focus on making learning systems forget, the process of which is defined as machine unlearning or unlearning. To perform unlearning upon learning system, we present general unlearning criteria, i.e., converting a learning system or part of it into a summation form of statistical query learning model, and updating all the summations to achieve unlearning. Then, we integrate our unlearning criteria into an unlearning architecture that interacts with all the components of a learning system, such as sample clustering and feature selection. To demonstrate our unlearning criteria and architecture, we select four real-world learning systems, including an item-item recommendation system, an online social network spam filter, and a malware detection system. These systems are first exposed to an adversarial environment, e.g., if the system is potentially vulnerable to training data pollution, we first pollute the training data set and show that the detection rate drops significantly. Then, we apply our unlearning technique upon those affected systems, either polluted or leaking private information. Our results show that after unlearning, the detection rate of a polluted system increases back to the one before pollution, and a system leaking a particular user’s private information completely forgets that information.","publisher":"IEEE","author":[{"propositions":[],"lastnames":["Cao"],"firstnames":["Yinzhi"],"suffixes":[]},{"propositions":[],"lastnames":["Yang"],"firstnames":["Junfeng"],"suffixes":[]}],"month":"May","year":"2015","bibtex":"@inproceedings{cao_towards_2015,\n\ttitle = {Towards {Making} {Systems} {Forget} with {Machine} {Unlearning}},\n\turl = {http://www.ieee-security.org/TC/SP2015/papers-archived/6949a463.pdf},\n\tabstract = {Today’s systems produce a wealth of data every day, and the data further\ngenerates more data, i.e., the derived data, forming into a complex data\npropagation network, defined as the data’s lineage. There are many reasons\nfor users and administrators to forget certain data including the data’s\nlineage. From the privacy perspective, a system may leak private\ninformation of certain users, and those users unhappy about privacy leaks\nnaturally want to forget their data and its lineage. From the security\nperspective, an anomaly detection system can be polluted by adversaries\nthrough injecting manually crafted data into the training set. Therefore,\nwe envision forgetting systems, capable of completely forgetting certain\ndata and its lineage. In this paper, we focus on making learning systems\nforget, the process of which is defined as machine unlearning or\nunlearning. To perform unlearning upon learning system, we present general\nunlearning criteria, i.e., converting a learning system or part of it into\na summation form of statistical query learning model, and updating all the\nsummations to achieve unlearning. Then, we integrate our unlearning\ncriteria into an unlearning architecture that interacts with all the\ncomponents of a learning system, such as sample clustering and feature\nselection. To demonstrate our unlearning criteria and architecture, we\nselect four real-world learning systems, including an item-item\nrecommendation system, an online social network spam filter, and a malware\ndetection system. These systems are first exposed to an adversarial\nenvironment, e.g., if the system is potentially vulnerable to training\ndata pollution, we first pollute the training data set and show that the\ndetection rate drops significantly. Then, we apply our unlearning\ntechnique upon those affected systems, either polluted or leaking private\ninformation. Our results show that after unlearning, the detection rate of\na polluted system increases back to the one before pollution, and a system\nleaking a particular user’s private information completely forgets that\ninformation.},\n\tpublisher = {IEEE},\n\tauthor = {Cao, Yinzhi and Yang, Junfeng},\n\tmonth = may,\n\tyear = {2015},\n}\n\n","author_short":["Cao, Y.","Yang, J."],"key":"cao_towards_2015","id":"cao_towards_2015","bibbaseid":"cao-yang-towardsmakingsystemsforgetwithmachineunlearning-2015","role":"author","urls":{"Paper":"http://www.ieee-security.org/TC/SP2015/papers-archived/6949a463.pdf"},"metadata":{"authorlinks":{}},"downloads":0},"search_terms":["towards","making","systems","forget","machine","unlearning","cao","yang"],"keywords":[],"authorIDs":[],"dataSources":["5Dp4QphkvpvNA33zi","jfoasiDDpStqkkoZB","BiuuFc45aHCgJqDLY"]}