TRIS: a Fast and Accurate Identifiers Splitting and Expansion Algorithm

TRIS: a Fast and Accurate Identifiers Splitting and Expansion Algorithm. Guerrouj, L., Galinier, P., Gu�h�neuc, Y., Antoniol, G., & Di Penta, M. In Oliveto, R. & Poshyvanyk, D., editors, Proceedings of the 19th Working Conference on Reverse Engineering (WCRE), pages 103–112, October, 2012. IEEE CS Press. 10 pages.

Paper abstract bibtex

In the quest of supporting various software engineering tasks such as program comprehension, reverse engineering, or program redocumentation researchers have proposed several identifier splitting and expansion approaches such as Samurai, TIDIER and more recently GenTest. The ultimate goal of such approaches is to help disambiguating conceptual information encoded in compound (or abbreviated) identifiers. This paper presents TRIS, TRee-based Identifier Splitter, a two-phases approach to split and expand program identifiers. TRIS takes as input a dictionary of words, the identifiers to split and the identifiers source code application. First, TRIS pre-compiles transformed dictionary words into a tree representation, associating a cost to each transformation. In a second phase, it maps the identifier splitting problem into a minimization problem, ıe the search of the shortest path (optimal split/expansion) in a weighted graph. We apply TRIS on a sample of 974 identifiers extracted from JHotDraw (Java), 3,085 Lynx identifiers (C), and on a sample of 489 C identifiers extracted from 340 C programs. Finally, we compared TRIS with GenTest on a set of 2,663 mixed Java, C and C++ identifiers. We report evidence that TRIS split (and expansion) is more accurate than state of the art approaches and that it is also efficient in terms of computation time.

@INPROCEEDINGS{Guerrouj12-WCRE-TRIS,
   AUTHOR       = {Latifa Guerrouj and Philippe Galinier and 
      Yann-Ga�l Gu�h�neuc and Giuliano Antoniol and Di Penta, Massimiliano},
   BOOKTITLE    = {Proceedings of the 19<sup>th</sup> Working Conference on Reverse Engineering (WCRE)},
   TITLE        = {TRIS: a Fast and Accurate Identifiers Splitting and 
      Expansion Algorithm},
   YEAR         = {2012},
   OPTADDRESS   = {},
   OPTCROSSREF  = {},
   EDITOR       = {Rocco Oliveto and Denys Poshyvanyk},
   MONTH        = {October},
   NOTE         = {10 pages.},
   OPTNUMBER    = {},
   OPTORGANIZATION = {},
   PAGES        = {103--112},
   PUBLISHER    = {IEEE CS Press},
   OPTSERIES    = {},
   OPTVOLUME    = {},
   KEYWORDS     = {Topic: <b>Identifier analysis</b>, Venue: <c>WCRE</c>},
   URL          = {http://www.ptidej.net/publications/documents/WCRE12b.doc.pdf},
   PDF          = {http://www.ptidej.net/publications/documents/WCRE12b.ppt.pdf},
   ABSTRACT     = {In the quest of supporting various software engineering 
      tasks such as program comprehension, reverse engineering, or program 
      redocumentation researchers have proposed several identifier 
      splitting and expansion approaches such as Samurai, TIDIER and more 
      recently GenTest. The ultimate goal of such approaches is to help 
      disambiguating conceptual information encoded in compound (or 
      abbreviated) identifiers. This paper presents TRIS, TRee-based 
      Identifier Splitter, a two-phases approach to split and expand 
      program identifiers. TRIS takes as input a dictionary of words, the 
      identifiers to split and the identifiers source code application. 
      First, TRIS pre-compiles transformed dictionary words into a tree 
      representation, associating a cost to each transformation. In a 
      second phase, it maps the identifier splitting problem into a 
      minimization problem, \ie{} the search of the shortest path (optimal 
      split/expansion) in a weighted graph. We apply TRIS on a sample of 
      974 identifiers extracted from JHotDraw (Java), 3,085 Lynx 
      identifiers (C), and on a sample of 489 C identifiers extracted from 
      340 C programs. Finally, we compared TRIS with GenTest on a set of 
      2,663 mixed Java, C and C++ identifiers. We report evidence that TRIS 
      split (and expansion) is more accurate than state of the art 
      approaches and that it is also efficient in terms of computation 
      time.}
}

Downloads: 0

{"_id":"2iTnAcTqRC6APX8oY","bibbaseid":"guerrouj-galinier-guhneuc-antoniol-dipenta-trisafastandaccurateidentifierssplittingandexpansionalgorithm-2012","downloads":0,"creationDate":"2018-01-17T20:29:42.386Z","title":"TRIS: a Fast and Accurate Identifiers Splitting and Expansion Algorithm","author_short":["Guerrouj, L.","Galinier, P.","Gu�h�neuc, Y.","Antoniol, G.","Di Penta, M."],"year":2012,"bibtype":"inproceedings","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Latifa"],"propositions":[],"lastnames":["Guerrouj"],"suffixes":[]},{"firstnames":["Philippe"],"propositions":[],"lastnames":["Galinier"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]},{"propositions":[],"lastnames":["Di","Penta"],"firstnames":["Massimiliano"],"suffixes":[]}],"booktitle":"Proceedings of the 19th Working Conference on Reverse Engineering (WCRE)","title":"TRIS: a Fast and Accurate Identifiers Splitting and Expansion Algorithm","year":"2012","optaddress":"","optcrossref":"","editor":[{"firstnames":["Rocco"],"propositions":[],"lastnames":["Oliveto"],"suffixes":[]},{"firstnames":["Denys"],"propositions":[],"lastnames":["Poshyvanyk"],"suffixes":[]}],"month":"October","note":"10 pages.","optnumber":"","optorganization":"","pages":"103–112","publisher":"IEEE CS Press","optseries":"","optvolume":"","keywords":"Topic: Identifier analysis, Venue: <c>WCRE</c>","url":"http://www.ptidej.net/publications/documents/WCRE12b.doc.pdf","pdf":"http://www.ptidej.net/publications/documents/WCRE12b.ppt.pdf","abstract":"In the quest of supporting various software engineering tasks such as program comprehension, reverse engineering, or program redocumentation researchers have proposed several identifier splitting and expansion approaches such as Samurai, TIDIER and more recently GenTest. The ultimate goal of such approaches is to help disambiguating conceptual information encoded in compound (or abbreviated) identifiers. This paper presents TRIS, TRee-based Identifier Splitter, a two-phases approach to split and expand program identifiers. TRIS takes as input a dictionary of words, the identifiers to split and the identifiers source code application. First, TRIS pre-compiles transformed dictionary words into a tree representation, associating a cost to each transformation. In a second phase, it maps the identifier splitting problem into a minimization problem, ıe the search of the shortest path (optimal split/expansion) in a weighted graph. We apply TRIS on a sample of 974 identifiers extracted from JHotDraw (Java), 3,085 Lynx identifiers (C), and on a sample of 489 C identifiers extracted from 340 C programs. Finally, we compared TRIS with GenTest on a set of 2,663 mixed Java, C and C++ identifiers. We report evidence that TRIS split (and expansion) is more accurate than state of the art approaches and that it is also efficient in terms of computation time.","bibtex":"@INPROCEEDINGS{Guerrouj12-WCRE-TRIS,\r\n AUTHOR = {Latifa Guerrouj and Philippe Galinier and \r\n Yann-Ga�l Gu�h�neuc and Giuliano Antoniol and Di Penta, Massimiliano},\r\n BOOKTITLE = {Proceedings of the 19th Working Conference on Reverse Engineering (WCRE)},\r\n TITLE = {TRIS: a Fast and Accurate Identifiers Splitting and \r\n Expansion Algorithm},\r\n YEAR = {2012},\r\n OPTADDRESS = {},\r\n OPTCROSSREF = {},\r\n EDITOR = {Rocco Oliveto and Denys Poshyvanyk},\r\n MONTH = {October},\r\n NOTE = {10 pages.},\r\n OPTNUMBER = {},\r\n OPTORGANIZATION = {},\r\n PAGES = {103--112},\r\n PUBLISHER = {IEEE CS Press},\r\n OPTSERIES = {},\r\n OPTVOLUME = {},\r\n KEYWORDS = {Topic: Identifier analysis, Venue: <c>WCRE</c>},\r\n URL = {http://www.ptidej.net/publications/documents/WCRE12b.doc.pdf},\r\n PDF = {http://www.ptidej.net/publications/documents/WCRE12b.ppt.pdf},\r\n ABSTRACT = {In the quest of supporting various software engineering \r\n tasks such as program comprehension, reverse engineering, or program \r\n redocumentation researchers have proposed several identifier \r\n splitting and expansion approaches such as Samurai, TIDIER and more \r\n recently GenTest. The ultimate goal of such approaches is to help \r\n disambiguating conceptual information encoded in compound (or \r\n abbreviated) identifiers. This paper presents TRIS, TRee-based \r\n Identifier Splitter, a two-phases approach to split and expand \r\n program identifiers. TRIS takes as input a dictionary of words, the \r\n identifiers to split and the identifiers source code application. \r\n First, TRIS pre-compiles transformed dictionary words into a tree \r\n representation, associating a cost to each transformation. In a \r\n second phase, it maps the identifier splitting problem into a \r\n minimization problem, \\ie{} the search of the shortest path (optimal \r\n split/expansion) in a weighted graph. We apply TRIS on a sample of \r\n 974 identifiers extracted from JHotDraw (Java), 3,085 Lynx \r\n identifiers (C), and on a sample of 489 C identifiers extracted from \r\n 340 C programs. Finally, we compared TRIS with GenTest on a set of \r\n 2,663 mixed Java, C and C++ identifiers. We report evidence that TRIS \r\n split (and expansion) is more accurate than state of the art \r\n approaches and that it is also efficient in terms of computation \r\n time.}\r\n}\r\n\r\n","author_short":["Guerrouj, L.","Galinier, P.","Gu�h�neuc, Y.","Antoniol, G.","Di Penta, M."],"editor_short":["Oliveto, R.","Poshyvanyk, D."],"key":"Guerrouj12-WCRE-TRIS","id":"Guerrouj12-WCRE-TRIS","bibbaseid":"guerrouj-galinier-guhneuc-antoniol-dipenta-trisafastandaccurateidentifierssplittingandexpansionalgorithm-2012","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/WCRE12b.doc.pdf"},"keyword":["Topic: Identifier analysis","Venue: <c>WCRE</c>"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"http://www.yann-gael.gueheneuc.net/"}}},"search_terms":["tris","fast","accurate","identifiers","splitting","expansion","algorithm","guerrouj","galinier","gu�h�neuc","antoniol","di penta"],"keywords":["topic: identifier analysis","venue: <c>wcre</c>"],"authorIDs":["AfJhKcg96muyPdu7S","xkviMnkrGBneANvMr"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}