Plumb: Efficient Processing of Multi-Users Pipelines (Extended)

Plumb: Efficient Processing of Multi-Users Pipelines (Extended). Qadeer, A. & Heidemann, J. Technical Report ISI-TR-727, USC/Information Sciences Institute, October, 2018.

Paper abstract bibtex

Services such as DNS and websites often produce streams of data that are consumed by analytics pipelines operated by multiple teams. Often this data is processed in large chunks (megabytes) to allow analysis of a block of time or to amortize costs. Such pipelines pose two problems: first, duplication of computation and storage may occur when parts of the pipeline are operated by different groups. Second, processing can be \emphlumpy, with \emphstructural lumpiness occurring when different stages need different amounts of resources, and \emphdata lumpiness occurring when a block of input requires increased resources. Duplication and structural lumpiness both can result in inefficient processing. Data lumpiness can cause pipeline failure or deadlock, for example if differences in DDoS traffic compared to normal can require $6×$ CPU\@. We propose \emphPlumb, a framework to abstract file processing for a multi-stage pipeline. Plumb integrates pipelines contributed by multiple users, detecting and eliminating duplication of computation and intermediate storage. It tracks and adjusts computation of each stage, accommodating both structural and data lumpiness. We exercise Plumb with the processing pipeline for B-Root DNS traffic, where it will replace a hand-tuned system to provide one third the original latency by utilizing $22%$ fewer CPU and will address limitations that occur as multiple users process data and when DDoS traffic causes huge shifts in performance.

@TechReport{Qadeer18a,
        author =        "Abdul Qadeer and John Heidemann",
        title =         "Plumb: Efficient Processing of Multi-Users Pipelines (Extended)",
	institution = 	"USC/Information Sciences Institute",
        year =          2018,
	sortdate = 		"2018-10-01", 
	project = "ant, lacanic, retrofuturebridge",
	jsubject = "network_big_data",
        number =     "ISI-TR-727",
        month =      oct,
        keywords =   "big data, hadoop, plumb, DNS, streaming data",
	url =		"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.html",
	pdfurl =	"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.pdf",
	myorganization =	"USC/Information Sciences Institute",
	copyrightholder = "authors",
	abstract = "Services such as DNS and websites often produce streams of data that
are consumed by analytics pipelines operated by multiple teams.  Often
this data is processed in large chunks (megabytes) to allow analysis
of a block of time or to amortize costs.  Such pipelines pose two
problems:  first, duplication of computation and storage may occur
when parts of the pipeline are operated by different groups.  Second,
processing can be \emph{lumpy}, with \emph{structural lumpiness}
occurring when different stages need different amounts of resources,
and \emph{data lumpiness} occurring when a block of input requires
increased resources.  Duplication and structural lumpiness both can
result in inefficient processing.  Data lumpiness can cause pipeline
failure or deadlock, for example if differences in DDoS traffic
compared to normal can require $6\times$ CPU\@.  We
propose \emph{Plumb}, a framework to abstract file processing for a
multi-stage pipeline.  Plumb integrates pipelines contributed by
multiple users, detecting and eliminating duplication of computation
and intermediate storage.  It tracks and adjusts computation of each
stage, accommodating both structural and data lumpiness.  We exercise
Plumb with the processing pipeline for B-Root DNS traffic, where it
will replace a hand-tuned system to provide one third the original
latency by utilizing $22\%$ fewer CPU and will address limitations
that occur as multiple users process data and when DDoS traffic causes
huge shifts in performance.",
}

Downloads: 0

{"_id":"G5fNmKYXSymqBj5Cn","bibbaseid":"qadeer-heidemann-plumbefficientprocessingofmultiuserspipelinesextended-2018","author_short":["Qadeer, A.","Heidemann, J."],"bibdata":{"bibtype":"techreport","type":"techreport","author":[{"firstnames":["Abdul"],"propositions":[],"lastnames":["Qadeer"],"suffixes":[]},{"firstnames":["John"],"propositions":[],"lastnames":["Heidemann"],"suffixes":[]}],"title":"Plumb: Efficient Processing of Multi-Users Pipelines (Extended)","institution":"USC/Information Sciences Institute","year":"2018","sortdate":"2018-10-01","project":"ant, lacanic, retrofuturebridge","jsubject":"network_big_data","number":"ISI-TR-727","month":"October","keywords":"big data, hadoop, plumb, DNS, streaming data","url":"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.html","pdfurl":"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.pdf","myorganization":"USC/Information Sciences Institute","copyrightholder":"authors","abstract":"Services such as DNS and websites often produce streams of data that are consumed by analytics pipelines operated by multiple teams. Often this data is processed in large chunks (megabytes) to allow analysis of a block of time or to amortize costs. Such pipelines pose two problems: first, duplication of computation and storage may occur when parts of the pipeline are operated by different groups. Second, processing can be \\emphlumpy, with \\emphstructural lumpiness occurring when different stages need different amounts of resources, and \\emphdata lumpiness occurring when a block of input requires increased resources. Duplication and structural lumpiness both can result in inefficient processing. Data lumpiness can cause pipeline failure or deadlock, for example if differences in DDoS traffic compared to normal can require $6×$ CPU\\@. We propose \\emphPlumb, a framework to abstract file processing for a multi-stage pipeline. Plumb integrates pipelines contributed by multiple users, detecting and eliminating duplication of computation and intermediate storage. It tracks and adjusts computation of each stage, accommodating both structural and data lumpiness. We exercise Plumb with the processing pipeline for B-Root DNS traffic, where it will replace a hand-tuned system to provide one third the original latency by utilizing $22%$ fewer CPU and will address limitations that occur as multiple users process data and when DDoS traffic causes huge shifts in performance.","bibtex":"@TechReport{Qadeer18a,\n author = \"Abdul Qadeer and John Heidemann\",\n title = \"Plumb: Efficient Processing of Multi-Users Pipelines (Extended)\",\n\tinstitution = \t\"USC/Information Sciences Institute\",\n year = 2018,\n\tsortdate = \t\t\"2018-10-01\", \n\tproject = \"ant, lacanic, retrofuturebridge\",\n\tjsubject = \"network_big_data\",\n number = \"ISI-TR-727\",\n month = oct,\n keywords = \"big data, hadoop, plumb, DNS, streaming data\",\n\turl =\t\t\"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.html\",\n\tpdfurl =\t\"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.pdf\",\n\tmyorganization =\t\"USC/Information Sciences Institute\",\n\tcopyrightholder = \"authors\",\n\tabstract = \"Services such as DNS and websites often produce streams of data that\nare consumed by analytics pipelines operated by multiple teams. Often\nthis data is processed in large chunks (megabytes) to allow analysis\nof a block of time or to amortize costs. Such pipelines pose two\nproblems: first, duplication of computation and storage may occur\nwhen parts of the pipeline are operated by different groups. Second,\nprocessing can be \\emph{lumpy}, with \\emph{structural lumpiness}\noccurring when different stages need different amounts of resources,\nand \\emph{data lumpiness} occurring when a block of input requires\nincreased resources. Duplication and structural lumpiness both can\nresult in inefficient processing. Data lumpiness can cause pipeline\nfailure or deadlock, for example if differences in DDoS traffic\ncompared to normal can require $6\\times$ CPU\\@. We\npropose \\emph{Plumb}, a framework to abstract file processing for a\nmulti-stage pipeline. Plumb integrates pipelines contributed by\nmultiple users, detecting and eliminating duplication of computation\nand intermediate storage. It tracks and adjusts computation of each\nstage, accommodating both structural and data lumpiness. We exercise\nPlumb with the processing pipeline for B-Root DNS traffic, where it\nwill replace a hand-tuned system to provide one third the original\nlatency by utilizing $22\\%$ fewer CPU and will address limitations\nthat occur as multiple users process data and when DDoS traffic causes\nhuge shifts in performance.\",\n}\n\n","author_short":["Qadeer, A.","Heidemann, J."],"bibbaseid":"qadeer-heidemann-plumbefficientprocessingofmultiuserspipelinesextended-2018","role":"author","urls":{"Paper":"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.html"},"keyword":["big data","hadoop","plumb","DNS","streaming data"],"metadata":{"authorlinks":{}}},"bibtype":"techreport","biburl":"https://bibbase.org/f/dHevizJoWEhWowz8q/johnh-2023-2.bib","dataSources":["YLyu3mj3xsBeoqiHK","fLZcDgNSoSuatv6aX","fxEParwu2ZfurScPY","7nuQvtHTqKrLmgu99"],"keywords":["big data","hadoop","plumb","dns","streaming data"],"search_terms":["plumb","efficient","processing","multi","users","pipelines","extended","qadeer","heidemann"],"title":"Plumb: Efficient Processing of Multi-Users Pipelines (Extended)","year":2018}