Axiomatic Attribution for Deep Networks

Axiomatic Attribution for Deep Networks. Sundararajan, M., Taly, A., & Yan, Q. arXiv:1703.01365 [cs], June, 2017. arXiv: 1703.01365

Paper abstract bibtex

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisﬁed by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modiﬁcation to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

@article{sundararajan_axiomatic_2017,
	title = {Axiomatic {Attribution} for {Deep} {Networks}},
	url = {http://arxiv.org/abs/1703.01365},
	abstract = {We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisﬁed by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modiﬁcation to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.},
	language = {en},
	urldate = {2022-03-02},
	journal = {arXiv:1703.01365 [cs]},
	author = {Sundararajan, Mukund and Taly, Ankur and Yan, Qiqi},
	month = jun,
	year = {2017},
	note = {arXiv: 1703.01365},
	keywords = {Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"8nYksYif74y6BCyKa","bibbaseid":"sundararajan-taly-yan-axiomaticattributionfordeepnetworks-2017","author_short":["Sundararajan, M.","Taly, A.","Yan, Q."],"bibdata":{"bibtype":"article","type":"article","title":"Axiomatic Attribution for Deep Networks","url":"http://arxiv.org/abs/1703.01365","abstract":"We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisﬁed by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modiﬁcation to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.","language":"en","urldate":"2022-03-02","journal":"arXiv:1703.01365 [cs]","author":[{"propositions":[],"lastnames":["Sundararajan"],"firstnames":["Mukund"],"suffixes":[]},{"propositions":[],"lastnames":["Taly"],"firstnames":["Ankur"],"suffixes":[]},{"propositions":[],"lastnames":["Yan"],"firstnames":["Qiqi"],"suffixes":[]}],"month":"June","year":"2017","note":"arXiv: 1703.01365","keywords":"Computer Science - Machine Learning","bibtex":"@article{sundararajan_axiomatic_2017,\n\ttitle = {Axiomatic {Attribution} for {Deep} {Networks}},\n\turl = {http://arxiv.org/abs/1703.01365},\n\tabstract = {We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms—Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisﬁed by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modiﬁcation to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.},\n\tlanguage = {en},\n\turldate = {2022-03-02},\n\tjournal = {arXiv:1703.01365 [cs]},\n\tauthor = {Sundararajan, Mukund and Taly, Ankur and Yan, Qiqi},\n\tmonth = jun,\n\tyear = {2017},\n\tnote = {arXiv: 1703.01365},\n\tkeywords = {Computer Science - Machine Learning},\n}\n\n","author_short":["Sundararajan, M.","Taly, A.","Yan, Q."],"key":"sundararajan_axiomatic_2017","id":"sundararajan_axiomatic_2017","bibbaseid":"sundararajan-taly-yan-axiomaticattributionfordeepnetworks-2017","role":"author","urls":{"Paper":"http://arxiv.org/abs/1703.01365"},"keyword":["Computer Science - Machine Learning"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/mxmplx","dataSources":["aXmRAq63YsH7a3ufx"],"keywords":["computer science - machine learning"],"search_terms":["axiomatic","attribution","deep","networks","sundararajan","taly","yan"],"title":"Axiomatic Attribution for Deep Networks","year":2017}