Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel. Tsai, Y. H., Bai, S., Yamada, M., Morency, L., & Salakhutdinov, R. In EMNLP, 2019.
Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel [link]Paper  doi  bibtex   
@inproceedings{DBLP:conf/emnlp/TsaiBYMS19,
  author    = {Yao{-}Hung Hubert Tsai and
               Shaojie Bai and
               Makoto Yamada and
               Louis{-}Philippe Morency and
               Ruslan Salakhutdinov},
  title     = {Transformer Dissection: An Unified Understanding for Transformer's
               Attention via the Lens of Kernel},
  booktitle = {EMNLP},
  year      = {2019},
  url       = {https://doi.org/10.18653/v1/D19-1443},
  doi       = {10.18653/v1/D19-1443},
  timestamp = {Thu, 07 Apr 2022 09:14:07 +0200},
  biburl    = {https://dblp.org/rec/conf/emnlp/TsaiBYMS19.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Downloads: 0