PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression

PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression. Weideman, N., Felkner, V. K., Wu, W., May, J., Hauser, C., & Garcia, L. In Proceedings of the 2021 Research on offensive and defensive techniques in the Context of Man At The End (MATE) Attacks, of Checkmate '21, pages 59–69, New York, NY, USA, November, 2021. Association for Computing Machinery.

Paper doi abstract bibtex

Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra–an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.

@inproceedings{weideman_perfume_2021,
	address = {New York, NY, USA},
	series = {Checkmate '21},
	title = {{PERFUME}: {Programmatic} {Extraction} and {Refinement} for {Usability} of {Mathematical} {Expression}},
	isbn = {978-1-4503-8552-7},
	shorttitle = {{PERFUME}},
	url = {https://doi.org/10.1145/3465413.3488575},
	doi = {10.1145/3465413.3488575},
	abstract = {Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra--an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.},
	urldate = {2021-11-21},
	booktitle = {Proceedings of the 2021 {Research} on offensive and defensive techniques in the {Context} of {Man} {At} {The} {End} ({MATE}) {Attacks}},
	publisher = {Association for Computing Machinery},
	author = {Weideman, Nicolaas and Felkner, Virginia K. and Wu, Wei-Cheng and May, Jonathan and Hauser, Christophe and Garcia, Luis},
	month = nov,
	year = {2021},
	keywords = {binary analysis, mentions sympy, reverse engineering},
	pages = {59--69},
}

Downloads: 0

{"_id":"a4KwPpoJ73wEA7bax","bibbaseid":"weideman-felkner-wu-may-hauser-garcia-perfumeprogrammaticextractionandrefinementforusabilityofmathematicalexpression-2021","author_short":["Weideman, N.","Felkner, V. K.","Wu, W.","May, J.","Hauser, C.","Garcia, L."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"New York, NY, USA","series":"Checkmate '21","title":"PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression","isbn":"978-1-4503-8552-7","shorttitle":"PERFUME","url":"https://doi.org/10.1145/3465413.3488575","doi":"10.1145/3465413.3488575","abstract":"Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra–an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.","urldate":"2021-11-21","booktitle":"Proceedings of the 2021 Research on offensive and defensive techniques in the Context of Man At The End (MATE) Attacks","publisher":"Association for Computing Machinery","author":[{"propositions":[],"lastnames":["Weideman"],"firstnames":["Nicolaas"],"suffixes":[]},{"propositions":[],"lastnames":["Felkner"],"firstnames":["Virginia","K."],"suffixes":[]},{"propositions":[],"lastnames":["Wu"],"firstnames":["Wei-Cheng"],"suffixes":[]},{"propositions":[],"lastnames":["May"],"firstnames":["Jonathan"],"suffixes":[]},{"propositions":[],"lastnames":["Hauser"],"firstnames":["Christophe"],"suffixes":[]},{"propositions":[],"lastnames":["Garcia"],"firstnames":["Luis"],"suffixes":[]}],"month":"November","year":"2021","keywords":"binary analysis, mentions sympy, reverse engineering","pages":"59–69","bibtex":"@inproceedings{weideman_perfume_2021,\n\taddress = {New York, NY, USA},\n\tseries = {Checkmate '21},\n\ttitle = {{PERFUME}: {Programmatic} {Extraction} and {Refinement} for {Usability} of {Mathematical} {Expression}},\n\tisbn = {978-1-4503-8552-7},\n\tshorttitle = {{PERFUME}},\n\turl = {https://doi.org/10.1145/3465413.3488575},\n\tdoi = {10.1145/3465413.3488575},\n\tabstract = {Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra--an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.},\n\turldate = {2021-11-21},\n\tbooktitle = {Proceedings of the 2021 {Research} on offensive and defensive techniques in the {Context} of {Man} {At} {The} {End} ({MATE}) {Attacks}},\n\tpublisher = {Association for Computing Machinery},\n\tauthor = {Weideman, Nicolaas and Felkner, Virginia K. and Wu, Wei-Cheng and May, Jonathan and Hauser, Christophe and Garcia, Luis},\n\tmonth = nov,\n\tyear = {2021},\n\tkeywords = {binary analysis, mentions sympy, reverse engineering},\n\tpages = {59--69},\n}\n\n","author_short":["Weideman, N.","Felkner, V. K.","Wu, W.","May, J.","Hauser, C.","Garcia, L."],"key":"weideman_perfume_2021","id":"weideman_perfume_2021","bibbaseid":"weideman-felkner-wu-may-hauser-garcia-perfumeprogrammaticextractionandrefinementforusabilityofmathematicalexpression-2021","role":"author","urls":{"Paper":"https://doi.org/10.1145/3465413.3488575"},"keyword":["binary analysis","mentions sympy","reverse engineering"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero-group/nicoguaro/525293","dataSources":["YtBDXPDiQEyhyEDZC","ntRmLzdBaoXbDrD8C","hbZSwot2msWk92m5B","y8Du94SQQE4osp3cR","YWgSDDpCXg94hahkt","nZpECqBHtcWYeRmxt","BnZgtH7HDESgbxKxt","fcWjcoAgajPvXWcp7","GvHfaAWP6AfN6oLQE","fhHfrQgj3AaGp7e9E","qzbMjEJf5d9Lk78vE","45tA9RFoXA9XeH4MM","MeSgs2KDKZo3bEbxH","nSXCrcahhCNfzvXEY","ecatNAsyr4f2iQyGq","tpWeaaCgFjPTYCjg3","j3Qzx9HAAC6WtJDHS","5eM3sAccSEpjSDHHQ"],"keywords":["binary analysis","mentions sympy","reverse engineering"],"search_terms":["perfume","programmatic","extraction","refinement","usability","mathematical","expression","weideman","felkner","wu","may","hauser","garcia"],"title":"PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression","year":2021}