High performance lda through collective model communication optimization. Zhang, B., Peng, B., B., & Qiu, J. Procedia Computer Science, 80:86-97, Elsevier, 2016.
doi  abstract   bibtex   
© The Authors. Published by Elsevier B.V. LDA is a widely used machine learning technique for big data analysis. The application includes an inference algorithm that iteratively updates a model until it converges. A major challenge is the scaling issue in parallelization owing to the fact that the model size is huge and parallel workers need to communicate the model continually. We identify three important features of the model in parallel LDA computation: 1. The volume of model parameters required for local computation is high; 2. The time complexity of local computation is proportional to the required model size; 3. The model size shrinks as it converges. By investigating collective and asynchronous methods for model communication in di erent tools, we discover that optimized collective communication can improve the model update speed, thus allowing the model to converge faster. The performance improvement derives not only from accelerated communication but also from reduced iteration computation time as the model size shrinks during the model convergence. To foster faster model convergence, we design new collective communication abstractions and implement two Harp-LDA applications, lgs and rtt. We compare our new approach with Yahoo! LDA and Petuum LDA, two leading implementations favoring asynchronous communication methods in the eld, on a 100-node, 4000-thread Intel Haswell cluster. The experiments show that lgs can reach higher model likelihood with shorter or similar execution time compared with Yahoo! LDA, while rtt can run up to 3.9 times faster compared with Petuum LDA when achieving similar model likelihood.
@article{
 title = {High performance lda through collective model communication optimization},
 type = {article},
 year = {2016},
 pages = {86-97},
 volume = {80},
 publisher = {Elsevier},
 id = {acb13c9b-3c97-3691-bfa7-5ada1a1da94c},
 created = {2018-01-09T20:30:37.407Z},
 file_attached = {false},
 profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
 last_modified = {2020-05-11T14:43:44.822Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Zhang2016a},
 source_type = {JOUR},
 folder_uuids = {36d8ccf4-7085-47fa-8ab9-897283d082c5},
 private_publication = {false},
 abstract = {© The Authors. Published by Elsevier B.V. LDA is a widely used machine learning technique for big data analysis. The application includes an inference algorithm that iteratively updates a model until it converges. A major challenge is the scaling issue in parallelization owing to the fact that the model size is huge and parallel workers need to communicate the model continually. We identify three important features of the model in parallel LDA computation: 1. The volume of model parameters required for local computation is high; 2. The time complexity of local computation is proportional to the required model size; 3. The model size shrinks as it converges. By investigating collective and asynchronous methods for model communication in di erent tools, we discover that optimized collective communication can improve the model update speed, thus allowing the model to converge faster. The performance improvement derives not only from accelerated communication but also from reduced iteration computation time as the model size shrinks during the model convergence. To foster faster model convergence, we design new collective communication abstractions and implement two Harp-LDA applications, lgs and rtt. We compare our new approach with Yahoo! LDA and Petuum LDA, two leading implementations favoring asynchronous communication methods in the eld, on a 100-node, 4000-thread Intel Haswell cluster. The experiments show that lgs can reach higher model likelihood with shorter or similar execution time compared with Yahoo! LDA, while rtt can run up to 3.9 times faster compared with Petuum LDA when achieving similar model likelihood.},
 bibtype = {article},
 author = {Zhang, Bingjing and Peng, B. Bo and Qiu, Judy},
 doi = {10.1016/j.procs.2016.05.300},
 journal = {Procedia Computer Science}
}

Downloads: 0