generated by bibbase.org
  2024 (1)
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels. Jangda, A.; Maleki, S.; Dehnavi, M. M.; Musuvathi, M.; and Saarikivi, O. In Grosser, T.; Dubach, C.; Steuwer, M.; Xue, J.; Ottoni, G.; and Pereira, e. M. Q., editor(s), IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2024, Edinburgh, United Kingdom, March 2-6, 2024, pages 93–105, 2024. IEEE
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels [link]Paper   doi   link   bibtex  
  2023 (7)
Development of a knowledge-sharing parallel computing approach for calibrating distributed watershed hydrologic models. Asgari, M.; Yang, W.; Lindsay, J. B.; Shao, H.; Liu, Y.; de Queiroga Miranda, R.; and Dehnavi, M. M. Environ. Model. Softw., 164: 105708. 2023.
Development of a knowledge-sharing parallel computing approach for calibrating distributed watershed hydrologic models [link]Paper   doi   link   bibtex  
Register Tiling for Unstructured Sparsity in Neural Network Inference. Wilkinson, L.; Cheshmi, K.; and Dehnavi, M. M. Proc. ACM Program. Lang., 7(PLDI): 1995–2020. 2023.
Register Tiling for Unstructured Sparsity in Neural Network Inference [link]Paper   doi   link   bibtex   1 download  
MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates. Mozaffari, M.; Li, S.; Zhang, Z.; and Dehnavi, M. M. In Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; and Levine, S., editor(s), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023.
MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates [link]Paper   link   bibtex   4 downloads  
Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence. Cheshmi, K.; Strout, M.; and Dehnavi, M. M. In Arnold, D.; Badia, R. M.; and Mohror, K. M., editor(s), Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023, Denver, CO, USA, November 12-17, 2023, pages 89:1–89:15, 2023. ACM
Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence [link]Paper   doi   link   bibtex   1 download  
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, Montreal, QC, Canada, 25 February 2023 - 1 March 2023. Dehnavi, M. M.; Kulkarni, M.; and Krishnamoorthy, S., editors. ACM. 2023.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, Montreal, QC, Canada, 25 February 2023 - 1 March 2023 [link]Paper   doi   link   bibtex   1 download  
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels. Jangda, A.; Maleki, S.; Dehnavi, M. M.; Musuvathi, M.; and Saarikivi, O. CoRR, abs/2305.13450. 2023.
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels [link]Paper   doi   link   bibtex   2 downloads  
MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates. Mozaffari, M.; Li, S.; Zhang, Z.; and Dehnavi, M. M. CoRR, abs/2306.01685. 2023.
MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates [link]Paper   doi   link   bibtex   4 downloads  
  2022 (7)
A review of parallel computing applications in calibrating watershed hydrologic models. Asgari, M.; Yang, W.; Lindsay, J. B.; Tolson, B. A.; and Dehnavi, M. M. Environ. Model. Softw., 151: 105370. 2022.
A review of parallel computing applications in calibrating watershed hydrologic models [link]Paper   doi   link   bibtex   1 download  
Randomized Gossiping With Effective Resistance Weights: Performance Guarantees and Applications. Can, B.; Soori, S.; Aybat, N. S.; Dehnavi, M. M.; and G"urb"uzbalaban, M. IEEE Trans. Control. Netw. Syst., 9(2): 524–536. 2022.
Randomized Gossiping With Effective Resistance Weights: Performance Guarantees and Applications [link]Paper   doi   link   bibtex   1 download  
Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization. Liu, B.; Laird, A.; Tsang, W. H.; Mahjour, B.; and Dehnavi, M. M. In Kl"ockner, A.; and Moreira, J., editor(s), Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, PACT 2022, Chicago, Illinois, October 8-12, 2022, pages 439–450, 2022. ACM
Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization [link]Paper   doi   link   bibtex   5 downloads  
HDagg: Hybrid Aggregation of Loop-carried Dependence Iterations in Sparse Matrix Computations. Zarebavani, B.; Cheshmi, K.; Liu, B.; Strout, M. M.; and Dehnavi, M. M. In 2022 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022, Lyon, France, May 30 - June 3, 2022, pages 1217–1227, 2022. IEEE
HDagg: Hybrid Aggregation of Loop-carried Dependence Iterations in Sparse Matrix Computations [link]Paper   doi   link   bibtex   2 downloads  
Optimizing sparse computations jointly. Cheshmi, K.; Strout, M. M.; and Dehnavi, M. M. In Lee, J.; Agrawal, K.; and Spear, M. F., editor(s), PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2 - 6, 2022, pages 459–460, 2022. ACM
Optimizing sparse computations jointly [link]Paper   doi   link   bibtex   2 downloads  
Vectorizing Sparse Matrix Computations with Partially-Strided Codelets. Cheshmi, K.; Cetinic, Z.; and Dehnavi, M. M. In Wolf, F.; Shende, S.; Culhane, C.; Alam, S. R.; and Jagode, H., editor(s), SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, November 13-18, 2022, pages 32:1–32:15, 2022. IEEE
Vectorizing Sparse Matrix Computations with Partially-Strided Codelets [link]Paper   doi   link   bibtex   3 downloads  
HyLo: A Hybrid Low-Rank Natural Gradient Descent Method. Mu, B.; Soori, S.; Can, B.; G"urb"uzbalaban, M.; and Dehnavi, M. M. In Wolf, F.; Shende, S.; Culhane, C.; Alam, S. R.; and Jagode, H., editor(s), SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, November 13-18, 2022, pages 47:1–47:16, 2022. IEEE
HyLo: A Hybrid Low-Rank Natural Gradient Descent Method [link]Paper   doi   link   bibtex  
  2021 (5)
L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method. Can, B.; Soori, S.; Dehnavi, M. M.; and G"urb"uzbalaban, M. In 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, December 14-17, 2021, pages 2386–2393, 2021. IEEE
L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method [link]Paper   doi   link   bibtex  
TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion. Soori, S.; Can, B.; Mu, B.; G"urb"uzbalaban, M.; and Dehnavi, M. M. CoRR, abs/2106.03947. 2021.
TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion [link]Paper   link   bibtex   1 download  
L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method. Can, B.; Soori, S.; Dehnavi, M. M.; and G"urb"uzbalaban, M. CoRR, abs/2108.09365. 2021.
L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method [link]Paper   link   bibtex  
Composing Loop-carried Dependence with Other Loops. Cheshmi, K.; Strout, M. M.; and Dehnavi, M. M. CoRR, abs/2111.12238. 2021.
Composing Loop-carried Dependence with Other Loops [link]Paper   link   bibtex  
Differentiating-based Vectorization for Sparse Kernels. Cetinic, Z.; Cheshmi, K.; and Dehnavi, M. M. CoRR, abs/2111.12243. 2021.
Differentiating-based Vectorization for Sparse Kernels [link]Paper   link   bibtex   2 downloads  
  2020 (4)
NASOQ: numerically accurate sparsity-oriented QP solver. Cheshmi, K.; Kaufman, D. M.; Kamil, S.; and Dehnavi, M. M. ACM Trans. Graph., 39(4): 96. 2020.
NASOQ: numerically accurate sparsity-oriented QP solver [link]Paper   doi   link   bibtex   1 download  
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate. Soori, S.; Mishchenko, K.; Mokhtari, A.; Dehnavi, M. M.; and G"urb"uzbalaban, M. In Chiappa, S.; and Calandra, R., editor(s), The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108, of Proceedings of Machine Learning Research, pages 1965–1976, 2020. PMLR
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate [link]Paper   link   bibtex  
ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning. Soori, S.; Can, B.; G"urb"uzbalaban, M.; and Dehnavi, M. M. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, May 18-22, 2020, pages 429–439, 2020. IEEE
ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning [link]Paper   doi   link   bibtex  
MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imation. Liu, B.; Cheshmi, K.; Soori, S.; Strout, M. M.; and Dehnavi, M. M. In Gupta, R.; and Shen, X., editor(s), PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22-26, 2020, pages 389–402, 2020. ACM
MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imation [link]Paper   doi   link   bibtex   2 downloads  
  2019 (2)
Sparse computation data dependence simplification for efficient compiler-generated inspectors. Mohammadi, M. S.; Yuki, T.; Cheshmi, K.; Davis, E. C.; Hall, M. W.; Dehnavi, M. M.; Nandy, P.; Olschanowsky, C.; Venkat, A.; and Strout, M. M. In McKinley, K. S.; and Fisher, K., editor(s), Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, pages 594–609, 2019. ACM
Sparse computation data dependence simplification for efficient compiler-generated inspectors [link]Paper   doi   link   bibtex   2 downloads  
ASYNC: Asynchronous Machine Learning on Distributed Systems. Soori, S.; Can, B.; G"urb"uzbalaban, M.; and Dehnavi, M. M. CoRR, abs/1907.08526. 2019.
ASYNC: Asynchronous Machine Learning on Distributed Systems [link]Paper   link   bibtex  
  2018 (7)
CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms. Blanco, Z.; Liu, B.; and Dehnavi, M. M. In Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018, Eugene, OR, USA, August 13-16, 2018, pages 21:1–21:10, 2018. ACM
CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms [link]Paper   doi   link   bibtex  
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems. Soori, S.; Devarakonda, A.; Blanco, Z.; Demmel, J.; G"urb"uzbalaban, M.; and Dehnavi, M. M. In Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018, Eugene, OR, USA, August 13-16, 2018, pages 22:1–22:10, 2018. ACM
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems [link]Paper   doi   link   bibtex  
Sparsity-Aware Storage Format Selection. Cheshmi, K.; Cheshmi, L.; and Dehnavi, M. M. In 2018 International Conference on High Performance Computing & Simulation, HPCS 2018, Orleans, France, July 16-20, 2018, pages 1034–1037, 2018. IEEE
Sparsity-Aware Storage Format Selection [link]Paper   doi   link   bibtex  
Extending Index-Array Properties for Data Dependence Analysis. Mohammadi, M. S.; Cheshmi, K.; Dehnavi, M. M.; Venkat, A.; Yuki, T.; and Strout, M. M. In Hall, M. W.; and Sundar, H., editor(s), Languages and Compilers for Parallel Computing - 31st International Workshop, LCPC 2018, Salt Lake City, UT, USA, October 9-11, 2018, Revised Selected Papers, volume 11882, of Lecture Notes in Computer Science, pages 78–93, 2018. Springer
Extending Index-Array Properties for Data Dependence Analysis [link]Paper   doi   link   bibtex  
ParSy: inspection and transformation of sparse matrix computations for parallelism. Cheshmi, K.; Kamil, S.; Strout, M. M.; and Dehnavi, M. M. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, TX, USA, November 11-16, 2018, pages 62:1–62:15, 2018. IEEE / ACM
ParSy: inspection and transformation of sparse matrix computations for parallelism [link]Paper   link   bibtex  
Sparse Matrix Code Dependence Analysis Simplification at Compile Time. Mohammadi, M. S.; Cheshmi, K.; Gopalakrishnan, G.; Hall, M. W.; Dehnavi, M. M.; Venkat, A.; Yuki, T.; and Strout, M. M. CoRR, abs/1807.10852. 2018.
Sparse Matrix Code Dependence Analysis Simplification at Compile Time [link]Paper   link   bibtex  
MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations. Liu, B.; Cheshmi, K.; Soori, S.; and Dehnavi, M. M. CoRR, abs/1812.07152. 2018.
MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations [link]Paper   link   bibtex  
  2017 (7)
Autotuning divide-and-conquer stencil computations. Natarajan, E. P.; Dehnavi, M. M.; and Leiserson, C. E. Concurr. Comput. Pract. Exp., 29(17). 2017.
Autotuning divide-and-conquer stencil computations [link]Paper   doi   link   bibtex   1 download  
A Unified Optimization Approach for Sparse Tensor Operations on GPUs. Liu, B.; Wen, C.; Sarwate, A. D.; and Dehnavi, M. M. In 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017, Honolulu, HI, USA, September 5-8, 2017, pages 47–57, 2017. IEEE Computer Society
A Unified Optimization Approach for Sparse Tensor Operations on GPUs [link]Paper   doi   link   bibtex  
Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. Cheshmi, K.; Kamil, S.; Strout, M. M.; and Dehnavi, M. M. In Mohr, B.; and Raghavan, P., editor(s), Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017, pages 13, 2017. ACM
Sympiler: transforming sparse matrix codes by decoupling symbolic analysis [link]Paper   doi   link   bibtex  
Power grid safety control via fine-grained multi-persona programmable logic controllers. Salles-Loustau, G.; Garcia, L.; Sun, P.; Dehnavi, M. M.; and Zonouz, S. A. In 2017 IEEE International Conference on Smart Grid Communications, SmartGridComm 2017, Dresden, Germany, October 23-27, 2017, pages 283–288, 2017. IEEE
Power grid safety control via fine-grained multi-persona programmable logic controllers [link]Paper   doi   link   bibtex  
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. Cheshmi, K.; Kamil, S.; Strout, M. M.; and Dehnavi, M. M. CoRR, abs/1705.06575. 2017.
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis [link]Paper   link   bibtex  
A Unified Optimization Approach for Sparse Tensor Operations on GPUs. Liu, B.; Wen, C.; Sarwate, A. D.; and Dehnavi, M. M. CoRR, abs/1705.09905. 2017.
A Unified Optimization Approach for Sparse Tensor Operations on GPUs [link]Paper   link   bibtex  
Avoiding Communication in Proximal Methods for Convex Optimization Problems. Soori, S.; Devarakonda, A.; Demmel, J.; G"urb"uzbalaban, M.; and Dehnavi, M. M. CoRR, abs/1710.08883. 2017.
Avoiding Communication in Proximal Methods for Convex Optimization Problems [link]Paper   link   bibtex  
  2015 (1)
Parallel finite element technique using Gaussian belief propagation. El-Kurdi, Y.; Dehnavi, M. M.; Gross, W. J.; and Giannacopoulos, D. Comput. Phys. Commun., 193: 38–48. 2015.
Parallel finite element technique using Gaussian belief propagation [link]Paper   doi   link   bibtex  
  2014 (4)
Survey on Grid Resource Allocation Mechanisms. Qureshi, M. B.; Dehnavi, M. M.; Min-Allah, N.; Qureshi, M. S.; Hussain, H.; Rentifis, I.; Tziritas, N.; Loukopoulos, T.; Khan, S. U.; Xu, C.; and Zomaya, A. Y. J. Grid Comput., 12(2): 399–441. 2014.
Survey on Grid Resource Allocation Mechanisms [link]Paper   doi   link   bibtex  
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil. You, Y.; Fu, H.; Song, S. L.; Dehnavi, M. M.; Gan, L.; Huang, X.; and Yang, G. Int. J. High Perform. Comput. Appl., 28(3): 301–318. 2014.
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil [link]Paper   doi   link   bibtex  
Designing a Heuristic Cross-Architecture Combination for Breadth-First Search. You, Y.; Bader, D. A.; and Dehnavi, M. M. In 43rd International Conference on Parallel Processing, ICPP 2014, Minneapolis, MN, USA, September 9-12, 2014, pages 70–79, 2014. IEEE Computer Society
Designing a Heuristic Cross-Architecture Combination for Breadth-First Search [link]Paper   doi   link   bibtex  
MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures. You, Y.; Song, S. L.; Fu, H.; Marquez, A.; Dehnavi, M. M.; Barker, K. J.; Cameron, K. W.; Randles, A. P.; and Yang, G. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19-23, 2014, pages 809–818, 2014. IEEE Computer Society
MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures [link]Paper   doi   link   bibtex  
  2013 (1)
Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units. Dehnavi, M. M.; Fernandez, D. M.; Gaudiot, J.; and Giannacopoulos, D. D. IEEE Trans. Parallel Distributed Syst., 24(9): 1852–1862. 2013.
Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units [link]Paper   doi   link   bibtex