Dropout as a Regularizer of Interaction Effects

Dropout as a Regularizer of Interaction Effects. Lengerich, B., Xing, E. P., & Caruana, R. In Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics, 2022.

Paper

Preprint abstract bibtex 5 downloads

We examine Dropout through the perspective of interactions: effects that require multiple variables. Given $N$ variables, there are ${N ḩoose k}$ possible sets of $k$ variables ($N$ univariate effects, $\mathcal{O}(N^2)$ pairwise interactions, $\mathcal{O}(N^3)$ 3-way interactions); we can thus imagine that models with large representational capacity could be dominated by high-order interactions. In this paper, we show that Dropout contributes a regularization effect which helps neural networks (NNs) explore functions of lower-order interactions before considering functions of higher-order interactions. Dropout imposes this regularization by reducing the effective learning rate of higher-order interactions. As a result, Dropout encourages models to learn lower-order functions of additive components. This understanding of Dropout has implications for choosing Dropout rates: higher Dropout rates should be used when we need stronger regularization against interactions. This perspective also issues caution against using Dropout to measure term salience because Dropout regularizes against high-order interactions. Finally, this view of Dropout as a regularizer of interactions provides insight into the varying effectiveness of Dropout across architectures and datasets. We also compare Dropout to weight decay and early stopping and find that it is difficult to obtain the same regularization with these alternatives.

@InProceedings{lengerich2022dropout,
  title={Dropout as a Regularizer of Interaction Effects},
  author={Lengerich, Benjamin and Xing, Eric P. and Caruana, Rich},
  journal={{Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics}},
  year={2022},
  informal_venue = {AISTATS},
  booktitle = {Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics},
  url_paper= {https://proceedings.mlr.press/v151/lengerich22a.html},
  url_preprint = {https://arxiv.org/abs/2007.00823},
  abstract = {We examine Dropout through the perspective of interactions: effects that require multiple variables. Given $N$ variables, there are ${N \choose k}$ possible sets of $k$ variables ($N$ univariate effects, $\mathcal{O}(N^2)$ pairwise interactions, $\mathcal{O}(N^3)$ 3-way interactions); we can thus imagine that models with large representational capacity could be dominated by high-order interactions. In this paper, we show that Dropout contributes a regularization effect which helps neural networks (NNs) explore functions of lower-order interactions before considering functions of higher-order interactions. Dropout imposes this regularization by reducing the effective learning rate of higher-order interactions. As a result, Dropout encourages models to learn lower-order functions of additive components. This understanding of Dropout has implications for choosing Dropout rates: higher Dropout rates should be used when we need stronger regularization against interactions. This perspective also issues caution against using Dropout to measure term salience because Dropout regularizes against high-order interactions. Finally, this view of Dropout as a regularizer of interactions provides insight into the varying effectiveness of Dropout across architectures and datasets. We also compare Dropout to weight decay and early stopping and find that it is difficult to obtain the same regularization with these alternatives.},
  keywords = {Deep Learning, Theory}
}

Downloads: 5

{"_id":"GGPQqkpChTg9e9r8j","bibbaseid":"lengerich-xing-caruana-dropoutasaregularizerofinteractioneffects-2022","author_short":["Lengerich, B.","Xing, E. P.","Caruana, R."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Dropout as a Regularizer of Interaction Effects","author":[{"propositions":[],"lastnames":["Lengerich"],"firstnames":["Benjamin"],"suffixes":[]},{"propositions":[],"lastnames":["Xing"],"firstnames":["Eric","P."],"suffixes":[]},{"propositions":[],"lastnames":["Caruana"],"firstnames":["Rich"],"suffixes":[]}],"journal":"Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics","year":"2022","informal_venue":"AISTATS","booktitle":"Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics","url_paper":"https://proceedings.mlr.press/v151/lengerich22a.html","url_preprint":"https://arxiv.org/abs/2007.00823","abstract":"We examine Dropout through the perspective of interactions: effects that require multiple variables. Given $N$ variables, there are ${N ḩoose k}$ possible sets of $k$ variables ($N$ univariate effects, $\\mathcal{O}(N^2)$ pairwise interactions, $\\mathcal{O}(N^3)$ 3-way interactions); we can thus imagine that models with large representational capacity could be dominated by high-order interactions. In this paper, we show that Dropout contributes a regularization effect which helps neural networks (NNs) explore functions of lower-order interactions before considering functions of higher-order interactions. Dropout imposes this regularization by reducing the effective learning rate of higher-order interactions. As a result, Dropout encourages models to learn lower-order functions of additive components. This understanding of Dropout has implications for choosing Dropout rates: higher Dropout rates should be used when we need stronger regularization against interactions. This perspective also issues caution against using Dropout to measure term salience because Dropout regularizes against high-order interactions. Finally, this view of Dropout as a regularizer of interactions provides insight into the varying effectiveness of Dropout across architectures and datasets. We also compare Dropout to weight decay and early stopping and find that it is difficult to obtain the same regularization with these alternatives.","keywords":"Deep Learning, Theory","bibtex":"@InProceedings{lengerich2022dropout,\n title={Dropout as a Regularizer of Interaction Effects},\n author={Lengerich, Benjamin and Xing, Eric P. and Caruana, Rich},\n journal={{Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics}},\n year={2022},\n informal_venue = {AISTATS},\n booktitle = {Proceedings of the Twenty Fifth International Conference on Artificial Intelligence and Statistics},\n url_paper= {https://proceedings.mlr.press/v151/lengerich22a.html},\n url_preprint = {https://arxiv.org/abs/2007.00823},\n abstract = {We examine Dropout through the perspective of interactions: effects that require multiple variables. Given $N$ variables, there are ${N \\choose k}$ possible sets of $k$ variables ($N$ univariate effects, $\\mathcal{O}(N^2)$ pairwise interactions, $\\mathcal{O}(N^3)$ 3-way interactions); we can thus imagine that models with large representational capacity could be dominated by high-order interactions. In this paper, we show that Dropout contributes a regularization effect which helps neural networks (NNs) explore functions of lower-order interactions before considering functions of higher-order interactions. Dropout imposes this regularization by reducing the effective learning rate of higher-order interactions. As a result, Dropout encourages models to learn lower-order functions of additive components. This understanding of Dropout has implications for choosing Dropout rates: higher Dropout rates should be used when we need stronger regularization against interactions. This perspective also issues caution against using Dropout to measure term salience because Dropout regularizes against high-order interactions. Finally, this view of Dropout as a regularizer of interactions provides insight into the varying effectiveness of Dropout across architectures and datasets. We also compare Dropout to weight decay and early stopping and find that it is difficult to obtain the same regularization with these alternatives.},\n keywords = {Deep Learning, Theory}\n}\n\n","author_short":["Lengerich, B.","Xing, E. P.","Caruana, R."],"key":"lengerich2022dropout","id":"lengerich2022dropout","bibbaseid":"lengerich-xing-caruana-dropoutasaregularizerofinteractioneffects-2022","role":"author","urls":{" paper":"https://proceedings.mlr.press/v151/lengerich22a.html"," preprint":"https://arxiv.org/abs/2007.00823"},"keyword":["Deep Learning","Theory"],"metadata":{"authorlinks":{}},"downloads":5,"html":""},"bibtype":"inproceedings","biburl":"http://web.mit.edu/~blengeri/www/lengerich_website.bib","dataSources":["ezuecqGQYxpz639hf","MYQReohz2EeYDJDva","sRozo3iEee6h28fas"],"keywords":["deep learning","theory"],"search_terms":["dropout","regularizer","interaction","effects","lengerich","xing","caruana"],"title":"Dropout as a Regularizer of Interaction Effects","year":2022,"downloads":5}