Mixtape: Breaking the Softmax Bottleneck Efficiently. Yang, Z., Luong, T., Salakhutdinov, R. R, & Le, Q. V Volume Advances in Neural Information Processing Systems2019. abstract bibtex The softmax bottleneck has been shown to limit the expressiveness of neural language models. Mixture of Softmaxes (MoS) is an effective approach to address such a theoretical limitation, but are expensive compared to softmax in terms of both memory and time. We propose Mixtape, an output layer that breaks the softmax bottleneck more efficiently with three novel techniques–-logit space vector gating, sigmoid tree decomposition, and gate sharing. On four benchmarks including language modeling and machine translation, the …
@Proceedings{Yang2019,
author = {Yang, Zhilin and Luong, Thang and Salakhutdinov, Russ R and Le, Quoc V},
editor = {},
title = {Mixtape: Breaking the Softmax Bottleneck Efficiently},
booktitle = {Mixtape: Breaking the Softmax Bottleneck Efficiently},
volume = {Advances in Neural Information Processing Systems},
publisher = {},
address = {},
pages = {15922-15930},
year = {2019},
abstract = {The softmax bottleneck has been shown to limit the expressiveness of neural language models. Mixture of Softmaxes (MoS) is an effective approach to address such a theoretical limitation, but are expensive compared to softmax in terms of both memory and time. We propose Mixtape, an output layer that breaks the softmax bottleneck more efficiently with three novel techniques---logit space vector gating, sigmoid tree decomposition, and gate sharing. On four benchmarks including language modeling and machine translation, the …},
keywords = {}}
Downloads: 0
{"_id":"Jvs6ZRXgCGovvn4yb","bibbaseid":"yang-luong-salakhutdinov-le-mixtapebreakingthesoftmaxbottleneckefficiently-2019","authorIDs":[],"author_short":["Yang, Z.","Luong, T.","Salakhutdinov, R. R","Le, Q. V"],"bibdata":{"bibtype":"proceedings","type":"proceedings","author":[{"propositions":[],"lastnames":["Yang"],"firstnames":["Zhilin"],"suffixes":[]},{"propositions":[],"lastnames":["Luong"],"firstnames":["Thang"],"suffixes":[]},{"propositions":[],"lastnames":["Salakhutdinov"],"firstnames":["Russ","R"],"suffixes":[]},{"propositions":[],"lastnames":["Le"],"firstnames":["Quoc","V"],"suffixes":[]}],"editor":[{"firstnames":[],"propositions":[],"lastnames":[""],"suffixes":[]}],"title":"Mixtape: Breaking the Softmax Bottleneck Efficiently","booktitle":"Mixtape: Breaking the Softmax Bottleneck Efficiently","volume":"Advances in Neural Information Processing Systems","publisher":"","address":"","pages":"15922-15930","year":"2019","abstract":"The softmax bottleneck has been shown to limit the expressiveness of neural language models. Mixture of Softmaxes (MoS) is an effective approach to address such a theoretical limitation, but are expensive compared to softmax in terms of both memory and time. We propose Mixtape, an output layer that breaks the softmax bottleneck more efficiently with three novel techniques–-logit space vector gating, sigmoid tree decomposition, and gate sharing. On four benchmarks including language modeling and machine translation, the …","keywords":"","bibtex":"@Proceedings{Yang2019,\nauthor = {Yang, Zhilin and Luong, Thang and Salakhutdinov, Russ R and Le, Quoc V}, \neditor = {}, \ntitle = {Mixtape: Breaking the Softmax Bottleneck Efficiently}, \nbooktitle = {Mixtape: Breaking the Softmax Bottleneck Efficiently}, \nvolume = {Advances in Neural Information Processing Systems}, \npublisher = {}, \naddress = {}, \npages = {15922-15930}, \nyear = {2019}, \nabstract = {The softmax bottleneck has been shown to limit the expressiveness of neural language models. Mixture of Softmaxes (MoS) is an effective approach to address such a theoretical limitation, but are expensive compared to softmax in terms of both memory and time. We propose Mixtape, an output layer that breaks the softmax bottleneck more efficiently with three novel techniques---logit space vector gating, sigmoid tree decomposition, and gate sharing. On four benchmarks including language modeling and machine translation, the …}, \nkeywords = {}}\n\n","author_short":["Yang, Z.","Luong, T.","Salakhutdinov, R. R","Le, Q. V"],"editor_short":[""],"key":"Yang2019","id":"Yang2019","bibbaseid":"yang-luong-salakhutdinov-le-mixtapebreakingthesoftmaxbottleneckefficiently-2019","role":"author","urls":{},"downloads":0},"bibtype":"proceedings","biburl":"https://gist.githubusercontent.com/stuhlmueller/a37ef2ef4f378ebcb73d249fe0f8377a/raw/6f96f6f779501bd9482896af3e4db4de88c35079/references.bib","creationDate":"2020-01-27T02:13:33.778Z","downloads":0,"keywords":[],"search_terms":["mixtape","breaking","softmax","bottleneck","efficiently","yang","luong","salakhutdinov","le"],"title":"Mixtape: Breaking the Softmax Bottleneck Efficiently","year":2019,"dataSources":["hEoKh4ygEAWbAZ5iy"]}