PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression

PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression. Weideman, N., Felkner, V. K., Wu, W., May, J., Hauser, C., & Garcia, L. In Proceedings of the 2021 Research on Offensive and Defensive Techniques in the Context of Man At The End (MATE) Attacks, of Checkmate '21, pages 59–69, New York, NY, USA, 2021. Association for Computing Machinery.

Paper doi abstract bibtex

Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra–an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.

@inproceedings{10.1145/3465413.3488575,
author = {Weideman, Nicolaas and Felkner, Virginia K. and Wu, Wei-Cheng and May, Jonathan and Hauser, Christophe and Garcia, Luis},
title = {PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression},
year = {2021},
isbn = {9781450385527},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3465413.3488575},
doi = {10.1145/3465413.3488575},
abstract = {Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra--an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.},
booktitle = {Proceedings of the 2021 Research on Offensive and Defensive Techniques in the Context of Man At The End (MATE) Attacks},
pages = {59–69},
numpages = {11},
keywords = {reverse engineering, binary analysis},
location = {Virtual Event, Republic of Korea},
series = {Checkmate '21}
}

Downloads: 0

{"_id":"a4KwPpoJ73wEA7bax","bibbaseid":"weideman-felkner-wu-may-hauser-garcia-perfumeprogrammaticextractionandrefinementforusabilityofmathematicalexpression-2021","author_short":["Weideman, N.","Felkner, V. K.","Wu, W.","May, J.","Hauser, C.","Garcia, L."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Weideman"],"firstnames":["Nicolaas"],"suffixes":[]},{"propositions":[],"lastnames":["Felkner"],"firstnames":["Virginia","K."],"suffixes":[]},{"propositions":[],"lastnames":["Wu"],"firstnames":["Wei-Cheng"],"suffixes":[]},{"propositions":[],"lastnames":["May"],"firstnames":["Jonathan"],"suffixes":[]},{"propositions":[],"lastnames":["Hauser"],"firstnames":["Christophe"],"suffixes":[]},{"propositions":[],"lastnames":["Garcia"],"firstnames":["Luis"],"suffixes":[]}],"title":"PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression","year":"2021","isbn":"9781450385527","publisher":"Association for Computing Machinery","address":"New York, NY, USA","url":"https://doi.org/10.1145/3465413.3488575","doi":"10.1145/3465413.3488575","abstract":"Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra–an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.","booktitle":"Proceedings of the 2021 Research on Offensive and Defensive Techniques in the Context of Man At The End (MATE) Attacks","pages":"59–69","numpages":"11","keywords":"reverse engineering, binary analysis","location":"Virtual Event, Republic of Korea","series":"Checkmate '21","bibtex":"@inproceedings{10.1145/3465413.3488575,\nauthor = {Weideman, Nicolaas and Felkner, Virginia K. and Wu, Wei-Cheng and May, Jonathan and Hauser, Christophe and Garcia, Luis},\ntitle = {PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression},\nyear = {2021},\nisbn = {9781450385527},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nurl = {https://doi.org/10.1145/3465413.3488575},\ndoi = {10.1145/3465413.3488575},\nabstract = {Algorithmic identification is the crux for several binary analysis applications, including malware analysis, vulnerability discovery, and embedded firmware reverse engineering. However, data-driven and signature-based approaches often break down when encountering outlier realizations of a particular algorithm. Moreover, reverse engineering of domain-specific binaries often requires collaborative analysis between reverse engineers and domain experts. Communicating the behavior of an unidentified binary program to non-reverse engineers necessitates the recovery of algorithmic semantics in a human-digestible form. This paper presents PERFUME, a framework that extracts symbolic math expressions from low-level binary representations of an algorithm. PERFUME works by translating a symbolic output representation of a binary function to a high-level mathematical expression. In particular, we detail how source and target representations are generated for training a machine translation model. We integrate PERFUME as a plug-in for Ghidra--an open-source reverse engineering framework. We present our preliminary findings for domain-specific use cases and formalize open challenges in mathematical expression extraction from algorithmic implementations.},\nbooktitle = {Proceedings of the 2021 Research on Offensive and Defensive Techniques in the Context of Man At The End (MATE) Attacks},\npages = {59–69},\nnumpages = {11},\nkeywords = {reverse engineering, binary analysis},\nlocation = {Virtual Event, Republic of Korea},\nseries = {Checkmate '21}\n}\n\n\n\n\n","author_short":["Weideman, N.","Felkner, V. K.","Wu, W.","May, J.","Hauser, C.","Garcia, L."],"key":"10.1145/3465413.3488575","id":"10.1145/3465413.3488575","bibbaseid":"weideman-felkner-wu-may-hauser-garcia-perfumeprogrammaticextractionandrefinementforusabilityofmathematicalexpression-2021","role":"author","urls":{"Paper":"https://doi.org/10.1145/3465413.3488575"},"keyword":["reverse engineering","binary analysis"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://jonmay.github.io/webpage/cutelabname/cutelabname.bib","dataSources":["YtBDXPDiQEyhyEDZC","ntRmLzdBaoXbDrD8C","hbZSwot2msWk92m5B","y8Du94SQQE4osp3cR","YWgSDDpCXg94hahkt","nZpECqBHtcWYeRmxt","fcWjcoAgajPvXWcp7","GvHfaAWP6AfN6oLQE","fhHfrQgj3AaGp7e9E","qzbMjEJf5d9Lk78vE","45tA9RFoXA9XeH4MM","MeSgs2KDKZo3bEbxH","nSXCrcahhCNfzvXEY","ecatNAsyr4f2iQyGq","tpWeaaCgFjPTYCjg3","j3Qzx9HAAC6WtJDHS","5eM3sAccSEpjSDHHQ"],"keywords":["reverse engineering","binary analysis"],"search_terms":["perfume","programmatic","extraction","refinement","usability","mathematical","expression","weideman","felkner","wu","may","hauser","garcia"],"title":"PERFUME: Programmatic Extraction and Refinement for Usability of Mathematical Expression","year":2021}