SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization. Singh, N., Data, D., George, J., & Diggavi, S. IEEE Journal on Selected Areas in Information Theory, 2(3):954-969, Sep., 2021.

Arxiv doi abstract bibtex 7 downloads

In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov’s momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.

@ARTICLE{9513259,
  author={Singh, Navjot and Data, Deepesh and George, Jemin and Diggavi, Suhas},
  journal={IEEE Journal on Selected Areas in Information Theory}, 
  title={SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization}, 
  year={2021},
  volume={2},
  number={3},
  pages={954-969},
  abstract={In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov’s momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.},
  keywords={},
  doi={10.1109/JSAIT.2021.3103920},
  ISSN={2641-8770},
  month={Sep.},
  tags = {journal,CEDL,DML},
  type = {2},
  url_arxiv = {https://arxiv.org/abs/2005.07041},
  }

Downloads: 7

{"_id":"CdhtZzwPKwmcxFvje","bibbaseid":"singh-data-george-diggavi-squarmsgdcommunicationefficientmomentumsgdfordecentralizedoptimization-2021","author_short":["Singh, N.","Data, D.","George, J.","Diggavi, S."],"bibdata":{"bibtype":"article","type":"2","author":[{"propositions":[],"lastnames":["Singh"],"firstnames":["Navjot"],"suffixes":[]},{"propositions":[],"lastnames":["Data"],"firstnames":["Deepesh"],"suffixes":[]},{"propositions":[],"lastnames":["George"],"firstnames":["Jemin"],"suffixes":[]},{"propositions":[],"lastnames":["Diggavi"],"firstnames":["Suhas"],"suffixes":[]}],"journal":"IEEE Journal on Selected Areas in Information Theory","title":"SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization","year":"2021","volume":"2","number":"3","pages":"954-969","abstract":"In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov’s momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.","keywords":"","doi":"10.1109/JSAIT.2021.3103920","issn":"2641-8770","month":"Sep.","tags":"journal,CEDL,DML","url_arxiv":"https://arxiv.org/abs/2005.07041","bibtex":"@ARTICLE{9513259,\n author={Singh, Navjot and Data, Deepesh and George, Jemin and Diggavi, Suhas},\n journal={IEEE Journal on Selected Areas in Information Theory}, \n title={SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization}, \n year={2021},\n volume={2},\n number={3},\n pages={954-969},\n abstract={In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov’s momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.},\n keywords={},\n doi={10.1109/JSAIT.2021.3103920},\n ISSN={2641-8770},\n month={Sep.},\n tags = {journal,CEDL,DML},\n type = {2},\n url_arxiv = {https://arxiv.org/abs/2005.07041},\n }\n\n","author_short":["Singh, N.","Data, D.","George, J.","Diggavi, S."],"key":"9513259","id":"9513259","bibbaseid":"singh-data-george-diggavi-squarmsgdcommunicationefficientmomentumsgdfordecentralizedoptimization-2021","role":"author","urls":{" arxiv":"https://arxiv.org/abs/2005.07041"},"metadata":{"authorlinks":{}},"downloads":7,"html":""},"bibtype":"article","biburl":"https://bibbase.org/network/files/e2kjGxYgtBo8SWSbC","dataSources":["hicKnsKYNEFXC4CgH","jxCYzXXYRqw2fiEXQ","wCByFFrQMyRwfzrJ6","yuqM5ah4HMsTyDrMa","YaM87hGQiepg5qijZ","n9wmfkt5w8CPqCepg","soj2cS6PgG8NPmWGr","FaDBDiyFAJY5pL28h","ycfdiwWPzC2rE6H77"],"keywords":[],"search_terms":["squarm","sgd","communication","efficient","momentum","sgd","decentralized","optimization","singh","data","george","diggavi"],"title":"SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization","year":2021,"downloads":7}