generated by bibbase.org
  2020 (3)
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate. Soori, S.; Mishchenko, K.; Mokhtari, A.; Dehnavi, M. M.; and G"urb"uzbalaban, M. In Chiappa, S.; and Calandra, R., editor(s), The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108, of Proceedings of Machine Learning Research, pages 1965–1976, 2020. PMLR
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate [link]Paper   bibtex
ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning. Soori, S.; Can, B.; G"urb"uzbalaban, M.; and Dehnavi, M. M. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, May 18-22, 2020, pages 429–439, 2020. IEEE
ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning [link]Paper   doi   bibtex
MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imation. Liu, B.; Cheshmi, K.; Soori, S.; Strout, M. M.; and Dehnavi, M. M. In Gupta, R.; and Shen, X., editor(s), PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22-26, 2020, pages 389–402, 2020. ACM
MatRox: modular approach for improving data locality in hierarchical (Mat)rix App(Rox)imation [link]Paper   doi   bibtex   1 download
  2019 (2)
Sparse computation data dependence simplification for efficient compiler-generated inspectors. Mohammadi, M. S.; Yuki, T.; Cheshmi, K.; Davis, E. C.; Hall, M. W.; Dehnavi, M. M.; Nandy, P.; Olschanowsky, C.; Venkat, A.; and Strout, M. M. In McKinley, K. S.; and Fisher, K., editor(s), Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, pages 594–609, 2019. ACM
Sparse computation data dependence simplification for efficient compiler-generated inspectors [link]Paper   doi   bibtex
ASYNC: Asynchronous Machine Learning on Distributed Systems. Soori, S.; Can, B.; G"urb"uzbalaban, M.; and Dehnavi, M. M. CoRR, abs/1907.08526. 2019.
ASYNC: Asynchronous Machine Learning on Distributed Systems [link]Paper   bibtex
  2018 (7)
CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms. Blanco, Z.; Liu, B.; and Dehnavi, M. M. In Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018, Eugene, OR, USA, August 13-16, 2018, pages 21:1–21:10, 2018. ACM
CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms [link]Paper   doi   bibtex
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems. Soori, S.; Devarakonda, A.; Blanco, Z.; Demmel, J.; G"urb"uzbalaban, M.; and Dehnavi, M. M. In Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018, Eugene, OR, USA, August 13-16, 2018, pages 22:1–22:10, 2018. ACM
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems [link]Paper   doi   bibtex
Sparsity-Aware Storage Format Selection. Cheshmi, K.; Cheshmi, L.; and Dehnavi, M. M. In 2018 International Conference on High Performance Computing & Simulation, HPCS 2018, Orleans, France, July 16-20, 2018, pages 1034–1037, 2018. IEEE
Sparsity-Aware Storage Format Selection [link]Paper   doi   bibtex
Extending Index-Array Properties for Data Dependence Analysis. Mohammadi, M. S.; Cheshmi, K.; Dehnavi, M. M.; Venkat, A.; Yuki, T.; and Strout, M. M. In Hall, M. W.; and Sundar, H., editor(s), Languages and Compilers for Parallel Computing - 31st International Workshop, LCPC 2018, Salt Lake City, UT, USA, October 9-11, 2018, Revised Selected Papers, volume 11882, of Lecture Notes in Computer Science, pages 78–93, 2018. Springer
Extending Index-Array Properties for Data Dependence Analysis [link]Paper   doi   bibtex
ParSy: inspection and transformation of sparse matrix computations for parallelism. Cheshmi, K.; Kamil, S.; Strout, M. M.; and Dehnavi, M. M. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, TX, USA, November 11-16, 2018, pages 62:1–62:15, 2018. IEEE / ACM
ParSy: inspection and transformation of sparse matrix computations for parallelism [link]Paper   bibtex
Sparse Matrix Code Dependence Analysis Simplification at Compile Time. Mohammadi, M. S.; Cheshmi, K.; Gopalakrishnan, G.; Hall, M. W.; Dehnavi, M. M.; Venkat, A.; Yuki, T.; and Strout, M. M. CoRR, abs/1807.10852. 2018.
Sparse Matrix Code Dependence Analysis Simplification at Compile Time [link]Paper   bibtex
MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations. Liu, B.; Cheshmi, K.; Soori, S.; and Dehnavi, M. M. CoRR, abs/1812.07152. 2018.
MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations [link]Paper   bibtex
  2017 (6)
Autotuning divide-and-conquer stencil computations. Natarajan, E. P.; Dehnavi, M. M.; and Leiserson, C. E. Concurr. Comput. Pract. Exp., 29(17). 2017.
Autotuning divide-and-conquer stencil computations [link]Paper   doi   bibtex
A Unified Optimization Approach for Sparse Tensor Operations on GPUs. Liu, B.; Wen, C.; Sarwate, A. D.; and Dehnavi, M. M. In 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017, Honolulu, HI, USA, September 5-8, 2017, pages 47–57, 2017. IEEE Computer Society
A Unified Optimization Approach for Sparse Tensor Operations on GPUs [link]Paper   doi   bibtex
Sympiler: transforming sparse matrix codes by decoupling symbolic analysis. Cheshmi, K.; Kamil, S.; Strout, M. M.; and Dehnavi, M. M. In Mohr, B.; and Raghavan, P., editor(s), Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017, pages 13:1–13:13, 2017. ACM
Sympiler: transforming sparse matrix codes by decoupling symbolic analysis [link]Paper   doi   bibtex
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. Cheshmi, K.; Kamil, S.; Strout, M. M.; and Dehnavi, M. M. CoRR, abs/1705.06575. 2017.
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis [link]Paper   bibtex
A Unified Optimization Approach for Sparse Tensor Operations on GPUs. Liu, B.; Wen, C.; Sarwate, A. D.; and Dehnavi, M. M. CoRR, abs/1705.09905. 2017.
A Unified Optimization Approach for Sparse Tensor Operations on GPUs [link]Paper   bibtex
Avoiding Communication in Proximal Methods for Convex Optimization Problems. Soori, S.; Devarakonda, A.; Demmel, J.; G"urb"uzbalaban, M.; and Dehnavi, M. M. CoRR, abs/1710.08883. 2017.
Avoiding Communication in Proximal Methods for Convex Optimization Problems [link]Paper   bibtex
  2015 (1)
Parallel finite element technique using Gaussian belief propagation. El-Kurdi, Y.; Dehnavi, M. M.; Gross, W. J.; and Giannacopoulos, D. Comput. Phys. Commun., 193: 38–48. 2015.
Parallel finite element technique using Gaussian belief propagation [link]Paper   doi   bibtex
  2014 (4)
Survey on Grid Resource Allocation Mechanisms. Qureshi, M. B.; Dehnavi, M. M.; Min-Allah, N.; Qureshi, M. S.; Hussain, H.; Rentifis, I.; Tziritas, N.; Loukopoulos, T.; Khan, S. U.; Xu, C.; and Zomaya, A. Y. J. Grid Comput., 12(2): 399–441. 2014.
Survey on Grid Resource Allocation Mechanisms [link]Paper   doi   bibtex
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil. You, Y.; Fu, H.; Song, S. L.; Dehnavi, M. M.; Gan, L.; Huang, X.; and Yang, G. Int. J. High Perform. Comput. Appl., 28(3): 301–318. 2014.
Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil [link]Paper   doi   bibtex
Designing a Heuristic Cross-Architecture Combination for Breadth-First Search. You, Y.; Bader, D. A.; and Dehnavi, M. M. In 43rd International Conference on Parallel Processing, ICPP 2014, Minneapolis, MN, USA, September 9-12, 2014, pages 70–79, 2014. IEEE Computer Society
Designing a Heuristic Cross-Architecture Combination for Breadth-First Search [link]Paper   doi   bibtex
MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures. You, Y.; Song, S. L.; Fu, H.; Marquez, A.; Dehnavi, M. M.; Barker, K. J.; Cameron, K. W.; Randles, A. P.; and Yang, G. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19-23, 2014, pages 809–818, 2014. IEEE Computer Society
MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures [link]Paper   doi   bibtex
  2013 (1)
Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units. Dehnavi, M. M.; Fernandez, D. M.; Gaudiot, J.; and Giannacopoulos, D. D. IEEE Trans. Parallel Distrib. Syst., 24(9): 1852–1862. 2013.
Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units [link]Paper   doi   bibtex