TRIS: a Fast and Accurate Identifiers Splitting and Expansion Algorithm. Guerrouj, L., Galinier, P., Gu�h�neuc, Y., Antoniol, G., & Di Penta, M. In Oliveto, R. & Poshyvanyk, D., editors, Proceedings of the 19<sup>th</sup> Working Conference on Reverse Engineering (WCRE), pages 103–112, October, 2012. IEEE CS Press. 10 pages.
TRIS: a Fast and Accurate Identifiers Splitting and Expansion Algorithm [pdf]Paper  abstract   bibtex   
In the quest of supporting various software engineering tasks such as program comprehension, reverse engineering, or program redocumentation researchers have proposed several identifier splitting and expansion approaches such as Samurai, TIDIER and more recently GenTest. The ultimate goal of such approaches is to help disambiguating conceptual information encoded in compound (or abbreviated) identifiers. This paper presents TRIS, TRee-based Identifier Splitter, a two-phases approach to split and expand program identifiers. TRIS takes as input a dictionary of words, the identifiers to split and the identifiers source code application. First, TRIS pre-compiles transformed dictionary words into a tree representation, associating a cost to each transformation. In a second phase, it maps the identifier splitting problem into a minimization problem, ıe the search of the shortest path (optimal split/expansion) in a weighted graph. We apply TRIS on a sample of 974 identifiers extracted from JHotDraw (Java), 3,085 Lynx identifiers (C), and on a sample of 489 C identifiers extracted from 340 C programs. Finally, we compared TRIS with GenTest on a set of 2,663 mixed Java, C and C++ identifiers. We report evidence that TRIS split (and expansion) is more accurate than state of the art approaches and that it is also efficient in terms of computation time.

Downloads: 0