Scheduling nested loops with the least number of processors

Scheduling nested loops with the least number of processors. Andronikos, T., Kalathas, M., Ciorba, F. M., Theodoropoulos, P., Kamenopoulos, D., & Papakonstantinou, G. In Proceedings of the 21st International Conference on Applied Informatics (AI 2003) of the International Associaltion of Science and Technology Development (IASTED), pages 713–718, February, 2003. ACTA Press.
abstract bibtex

This paper describes Cronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of Cronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. Cronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd–Steinberg dithering algorithm, the Transitive Closure algorithm, and the \FSBM\ motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. Cronus outperforms \UPC\ by 5–95% depending on the test case.

@inproceedings{Andronikos:2003,
	Abstract = {This paper describes Cronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of Cronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. Cronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd--Steinberg dithering algorithm, the Transitive Closure algorithm, and the \{FSBM\} motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. Cronus outperforms \{UPC\} by 5--95% depending on the test case. },
	Author = {Andronikos, Theodore and Kalathas, Mario and Ciorba, Florina M. and Theodoropoulos, Panagiotis and Kamenopoulos, Dimitris and Papakonstantinou, George},
	Booktitle = {Proceedings of the 21st International Conference on Applied Informatics (AI 2003) of the International Associaltion of Science and Technology Development (IASTED)},
	Date-Modified = {2016-01-06 13:28:00 +0000},
	Keywords = {2003},
	Month = {February},
	Pages = {713--718},
	Publisher = {ACTA Press},
	Title = {{Scheduling nested loops with the least number of processors}},
	Year = {2003},
	Bdsk-Url-1 = {http://www.sciencedirect.com/science/article/pii/S0164121207003093},
	Bdsk-Url-2 = {http://dx.doi.org/10.1016/j.jss.2007.11.715}}

%% This BibTeX bibliography file was created using BibDesk.
%% http://bibdesk.sourceforge.net/

%% Created for Florina M. Ciorba at 2016-01-06 13:57:17 +0100 


%% Saved with string encoding Unicode (UTF-8)

Downloads: 0

{"_id":"7KEMsXRvoz6jrZuj8","bibbaseid":"andronikos-kalathas-ciorba-theodoropoulos-kamenopoulos-papakonstantinou-schedulingnestedloopswiththeleastnumberofprocessors-2003","author_short":["Andronikos, T.","Kalathas, M.","Ciorba, F. M.","Theodoropoulos, P.","Kamenopoulos, D.","Papakonstantinou, G."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"This paper describes Cronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of Cronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. Cronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd–Steinberg dithering algorithm, the Transitive Closure algorithm, and the \\FSBM\\ motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. Cronus outperforms \\UPC\\ by 5–95% depending on the test case. ","author":[{"propositions":[],"lastnames":["Andronikos"],"firstnames":["Theodore"],"suffixes":[]},{"propositions":[],"lastnames":["Kalathas"],"firstnames":["Mario"],"suffixes":[]},{"propositions":[],"lastnames":["Ciorba"],"firstnames":["Florina","M."],"suffixes":[]},{"propositions":[],"lastnames":["Theodoropoulos"],"firstnames":["Panagiotis"],"suffixes":[]},{"propositions":[],"lastnames":["Kamenopoulos"],"firstnames":["Dimitris"],"suffixes":[]},{"propositions":[],"lastnames":["Papakonstantinou"],"firstnames":["George"],"suffixes":[]}],"booktitle":"Proceedings of the 21st International Conference on Applied Informatics (AI 2003) of the International Associaltion of Science and Technology Development (IASTED)","date-modified":"2016-01-06 13:28:00 +0000","keywords":"2003","month":"February","pages":"713–718","publisher":"ACTA Press","title":"Scheduling nested loops with the least number of processors","year":"2003","bdsk-url-1":"http://www.sciencedirect.com/science/article/pii/S0164121207003093","bdsk-url-2":"http://dx.doi.org/10.1016/j.jss.2007.11.715","bibtex":"@inproceedings{Andronikos:2003,\n\tAbstract = {This paper describes Cronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of Cronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. Cronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd--Steinberg dithering algorithm, the Transitive Closure algorithm, and the \\{FSBM\\} motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. Cronus outperforms \\{UPC\\} by 5--95% depending on the test case. },\n\tAuthor = {Andronikos, Theodore and Kalathas, Mario and Ciorba, Florina M. and Theodoropoulos, Panagiotis and Kamenopoulos, Dimitris and Papakonstantinou, George},\n\tBooktitle = {Proceedings of the 21st International Conference on Applied Informatics (AI 2003) of the International Associaltion of Science and Technology Development (IASTED)},\n\tDate-Modified = {2016-01-06 13:28:00 +0000},\n\tKeywords = {2003},\n\tMonth = {February},\n\tPages = {713--718},\n\tPublisher = {ACTA Press},\n\tTitle = {{Scheduling nested loops with the least number of processors}},\n\tYear = {2003},\n\tBdsk-Url-1 = {http://www.sciencedirect.com/science/article/pii/S0164121207003093},\n\tBdsk-Url-2 = {http://dx.doi.org/10.1016/j.jss.2007.11.715}}\n\n%% This BibTeX bibliography file was created using BibDesk.\n%% http://bibdesk.sourceforge.net/\n\n%% Created for Florina M. Ciorba at 2016-01-06 13:57:17 +0100 \n\n\n%% Saved with string encoding Unicode (UTF-8) \n\n\n\n","author_short":["Andronikos, T.","Kalathas, M.","Ciorba, F. M.","Theodoropoulos, P.","Kamenopoulos, D.","Papakonstantinou, G."],"bibbaseid":"andronikos-kalathas-ciorba-theodoropoulos-kamenopoulos-papakonstantinou-schedulingnestedloopswiththeleastnumberofprocessors-2003","role":"author","urls":{},"keyword":["2003"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://bibbase.org/f/5soafvAi5dxziNqBd/conferences.bib","dataSources":["yquS8QakKCWDYDAk9","aMnnnpEwMvMZwRFdC","we4PjHBkvY3ihPaQy","yezXfWaY6nTQG2nAv","pWKBSwppQZXZa3qnh"],"keywords":["2003"],"search_terms":["scheduling","nested","loops","number","processors","andronikos","kalathas","ciorba","theodoropoulos","kamenopoulos","papakonstantinou"],"title":"Scheduling nested loops with the least number of processors","year":2003}