Plumb: Efficient Processing of Multi-Users Pipelines (Extended). Qadeer, A. & Heidemann, J. Technical Report ISI-TR-727, USC/Information Sciences Institute, October, 2018.
Plumb: Efficient Processing of Multi-Users Pipelines (Extended) [link]Paper  abstract   bibtex   
Services such as DNS and websites often produce streams of data that are consumed by analytics pipelines operated by multiple teams. Often this data is processed in large chunks (megabytes) to allow analysis of a block of time or to amortize costs. Such pipelines pose two problems: first, duplication of computation and storage may occur when parts of the pipeline are operated by different groups. Second, processing can be \emphlumpy, with \emphstructural lumpiness occurring when different stages need different amounts of resources, and \emphdata lumpiness occurring when a block of input requires increased resources. Duplication and structural lumpiness both can result in inefficient processing. Data lumpiness can cause pipeline failure or deadlock, for example if differences in DDoS traffic compared to normal can require $6×$ CPU\@. We propose \emphPlumb, a framework to abstract file processing for a multi-stage pipeline. Plumb integrates pipelines contributed by multiple users, detecting and eliminating duplication of computation and intermediate storage. It tracks and adjusts computation of each stage, accommodating both structural and data lumpiness. We exercise Plumb with the processing pipeline for B-Root DNS traffic, where it will replace a hand-tuned system to provide one third the original latency by utilizing $22%$ fewer CPU and will address limitations that occur as multiple users process data and when DDoS traffic causes huge shifts in performance.
@TechReport{Qadeer18a,
        author =        "Abdul Qadeer and John Heidemann",
        title =         "Plumb: Efficient Processing of Multi-Users Pipelines (Extended)",
	institution = 	"USC/Information Sciences Institute",
        year =          2018,
	sortdate = 		"2018-10-01", 
	project = "ant, lacanic, retrofuturebridge",
	jsubject = "network_big_data",
        number =     "ISI-TR-727",
        month =      oct,
        keywords =   "big data, hadoop, plumb, DNS, streaming data",
	url =		"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.html",
	pdfurl =	"https://ant.isi.edu/%7ejohnh/PAPERS/Qadeer18a.pdf",
	myorganization =	"USC/Information Sciences Institute",
	copyrightholder = "authors",
	abstract = "Services such as DNS and websites often produce streams of data that
are consumed by analytics pipelines operated by multiple teams.  Often
this data is processed in large chunks (megabytes) to allow analysis
of a block of time or to amortize costs.  Such pipelines pose two
problems:  first, duplication of computation and storage may occur
when parts of the pipeline are operated by different groups.  Second,
processing can be \emph{lumpy}, with \emph{structural lumpiness}
occurring when different stages need different amounts of resources,
and \emph{data lumpiness} occurring when a block of input requires
increased resources.  Duplication and structural lumpiness both can
result in inefficient processing.  Data lumpiness can cause pipeline
failure or deadlock, for example if differences in DDoS traffic
compared to normal can require $6\times$ CPU\@.  We
propose \emph{Plumb}, a framework to abstract file processing for a
multi-stage pipeline.  Plumb integrates pipelines contributed by
multiple users, detecting and eliminating duplication of computation
and intermediate storage.  It tracks and adjusts computation of each
stage, accommodating both structural and data lumpiness.  We exercise
Plumb with the processing pipeline for B-Root DNS traffic, where it
will replace a hand-tuned system to provide one third the original
latency by utilizing $22\%$ fewer CPU and will address limitations
that occur as multiple users process data and when DDoS traffic causes
huge shifts in performance.",
}

Downloads: 0