Tracelet-based Code Search in Executables. David, Y. and Yahav, E. In Proc. of the 35th ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 349--360, 2014.
abstract   bibtex   
We address the problem of code search in executables. Given a function in binary form and a large code base, our goal is to statically find similar functions in the code base. Towards this end, we present a novel technique for computing similarity between functions. Our notion of similarity is based on decomposition of functions into tracelets: continuous, short, partial traces of an execution. To establish tracelet similarity in the face of low-level compiler transformations, we employ a simple rewriting engine. This engine uses constraint solving over alignment constraints and data dependencies to match registers and memory addresses between tracelets, bridging the gap between tracelets that are otherwise similar. We have implemented our approach and applied it to find matches in over a million binary functions. We compare tracelet matching to approaches based on n-grams and graphlets and show that tracelet matching obtains dramatically better precision and recall.
@inproceedings{david_tracelet-based_2014,
	title = {Tracelet-based {Code} {Search} in {Executables}},
	abstract = {We address the problem of code search in executables. Given a function in binary form and a large code base, our goal is to statically find similar functions in the code base. Towards this end, we present a novel technique for computing similarity between functions. Our notion of similarity is based on decomposition of functions into tracelets: continuous, short, partial traces of an execution. To establish tracelet similarity in the face of low-level compiler transformations, we employ a simple rewriting engine. This engine uses constraint solving over alignment constraints and data dependencies to match registers and memory addresses between tracelets, bridging the gap between tracelets that are otherwise similar. We have implemented our approach and applied it to find matches in over a million binary functions. We compare tracelet matching to approaches based on n-grams and graphlets and show that tracelet matching obtains dramatically better precision and recall.},
	booktitle = {Proc. of the 35th {ACM} {SIGPLAN} {Conf}. on {Programming} {Language} {Design} and {Implementation}},
	author = {David, Yaniv and Yahav, Eran},
	year = {2014},
	keywords = {bdiff\_, bdiff\_dynamic, bdiff\_new, static binary analysis, x86, x86-64},
	pages = {349--360}
}
Downloads: 0