Recognizing Words from Source Code Identifiers using Speech Recognition Techniques

Recognizing Words from Source Code Identifiers using Speech Recognition Techniques. Madani, N., Guerrouj, L., Di Penta, M., Gu�h�neuc, Y., & Antoniol, G. In Ferenc, R. & Due�as, J. C., editors, Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR), pages 68–77, March, 2010. IEEE CS Press. Best paper. 10 pages.

Paper abstract bibtex

The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifiers guide the cognitive process. Recognizing the words forming identifiers is not an easy task when naming conventions (\eg Camel Case) are not used or strictly followed and–or when these words have been abbreviated or otherwise transformed. This paper proposes a technique inspired from speech recognition, ıe dynamic time warping, to split identifiers into component words. The proposed technique has been applied to identifiers extracted from two different applications: JHotDraw and Lynx. Results compared to manually-built oracles and with Camel Case algorithm are encouraging. In fact, they show that the technique successfully recognize words composing identifiers (even when abbreviated) in about 90% of cases and that it performs better than Camel Case. Furthermore, it was able to spot mistakes in the manually-built oracle.

@INPROCEEDINGS{Madani10-CSMR-IdentifiersSpeechRecognition,
   AUTHOR       = {Nioosha Madani and Latifa Guerrouj and 
      Di Penta, Massimiliano and Yann-Ga�l Gu�h�neuc and Giuliano Antoniol},
   BOOKTITLE    = {Proceedings of the 14<sup>th</sup> European Conference on Software Maintenance and Reengineering (CSMR)},
   TITLE        = {Recognizing Words from Source Code Identifiers using 
      Speech Recognition Techniques},
   YEAR         = {2010},
   OPTADDRESS   = {},
   OPTCROSSREF  = {},
   EDITOR       = {Rudolf Ferenc and Juan Carlos Due�as},
   MONTH        = {March},
   NOTE         = {Best paper. 10 pages.},
   OPTNUMBER    = {},
   OPTORGANIZATION = {},
   PAGES        = {68--77},
   PUBLISHER    = {IEEE CS Press},
   OPTSERIES    = {},
   OPTVOLUME    = {},
   KEYWORDS     = {Topic: <b>Identifier analysis</b>, Venue: <c>CSMR</c>},
   URL          = {http://www.ptidej.net/publications/documents/CSMR10c.doc.pdf},
   PDF          = {http://www.ptidej.net/publications/documents/CSMR10c.ppt.pdf},
   ABSTRACT     = {The existing software engineering literature has 
      empirically shown that a proper choice of identifiers influences 
      software understandability and maintainability. Researchers have 
      noticed that identifiers are one of the most important source of 
      information about program entities and that the semantic of 
      identifiers guide the cognitive process. Recognizing the words 
      forming identifiers is not an easy task when naming conventions 
      (\eg{} Camel Case) are not used or strictly followed and--or when 
      these words have been abbreviated or otherwise transformed. This 
      paper proposes a technique inspired from speech recognition, \ie{} 
      dynamic time warping, to split identifiers into component words. The 
      proposed technique has been applied to identifiers extracted from two 
      different applications: JHotDraw and Lynx. Results compared to 
      manually-built oracles and with Camel Case algorithm are encouraging. 
      In fact, they show that the technique successfully recognize words 
      composing identifiers (even when abbreviated) in about 90\% of cases 
      and that it performs better than Camel Case. Furthermore, it was able 
      to spot mistakes in the manually-built oracle.}
}

Downloads: 0

{"_id":"PMBYJCCxLYYabpuhe","bibbaseid":"madani-guerrouj-dipenta-guhneuc-antoniol-recognizingwordsfromsourcecodeidentifiersusingspeechrecognitiontechniques-2010","downloads":0,"creationDate":"2018-01-17T20:29:42.478Z","title":"Recognizing Words from Source Code Identifiers using Speech Recognition Techniques","author_short":["Madani, N.","Guerrouj, L.","Di Penta, M.","Gu�h�neuc, Y.","Antoniol, G."],"year":2010,"bibtype":"inproceedings","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Nioosha"],"propositions":[],"lastnames":["Madani"],"suffixes":[]},{"firstnames":["Latifa"],"propositions":[],"lastnames":["Guerrouj"],"suffixes":[]},{"propositions":[],"lastnames":["Di","Penta"],"firstnames":["Massimiliano"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]}],"booktitle":"Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR)","title":"Recognizing Words from Source Code Identifiers using Speech Recognition Techniques","year":"2010","optaddress":"","optcrossref":"","editor":[{"firstnames":["Rudolf"],"propositions":[],"lastnames":["Ferenc"],"suffixes":[]},{"firstnames":["Juan","Carlos"],"propositions":[],"lastnames":["Due�as"],"suffixes":[]}],"month":"March","note":"Best paper. 10 pages.","optnumber":"","optorganization":"","pages":"68–77","publisher":"IEEE CS Press","optseries":"","optvolume":"","keywords":"Topic: Identifier analysis, Venue: <c>CSMR</c>","url":"http://www.ptidej.net/publications/documents/CSMR10c.doc.pdf","pdf":"http://www.ptidej.net/publications/documents/CSMR10c.ppt.pdf","abstract":"The existing software engineering literature has empirically shown that a proper choice of identifiers influences software understandability and maintainability. Researchers have noticed that identifiers are one of the most important source of information about program entities and that the semantic of identifiers guide the cognitive process. Recognizing the words forming identifiers is not an easy task when naming conventions (\\eg Camel Case) are not used or strictly followed and–or when these words have been abbreviated or otherwise transformed. This paper proposes a technique inspired from speech recognition, ıe dynamic time warping, to split identifiers into component words. The proposed technique has been applied to identifiers extracted from two different applications: JHotDraw and Lynx. Results compared to manually-built oracles and with Camel Case algorithm are encouraging. In fact, they show that the technique successfully recognize words composing identifiers (even when abbreviated) in about 90% of cases and that it performs better than Camel Case. Furthermore, it was able to spot mistakes in the manually-built oracle.","bibtex":"@INPROCEEDINGS{Madani10-CSMR-IdentifiersSpeechRecognition,\r\n AUTHOR = {Nioosha Madani and Latifa Guerrouj and \r\n Di Penta, Massimiliano and Yann-Ga�l Gu�h�neuc and Giuliano Antoniol},\r\n BOOKTITLE = {Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR)},\r\n TITLE = {Recognizing Words from Source Code Identifiers using \r\n Speech Recognition Techniques},\r\n YEAR = {2010},\r\n OPTADDRESS = {},\r\n OPTCROSSREF = {},\r\n EDITOR = {Rudolf Ferenc and Juan Carlos Due�as},\r\n MONTH = {March},\r\n NOTE = {Best paper. 10 pages.},\r\n OPTNUMBER = {},\r\n OPTORGANIZATION = {},\r\n PAGES = {68--77},\r\n PUBLISHER = {IEEE CS Press},\r\n OPTSERIES = {},\r\n OPTVOLUME = {},\r\n KEYWORDS = {Topic: Identifier analysis, Venue: <c>CSMR</c>},\r\n URL = {http://www.ptidej.net/publications/documents/CSMR10c.doc.pdf},\r\n PDF = {http://www.ptidej.net/publications/documents/CSMR10c.ppt.pdf},\r\n ABSTRACT = {The existing software engineering literature has \r\n empirically shown that a proper choice of identifiers influences \r\n software understandability and maintainability. Researchers have \r\n noticed that identifiers are one of the most important source of \r\n information about program entities and that the semantic of \r\n identifiers guide the cognitive process. Recognizing the words \r\n forming identifiers is not an easy task when naming conventions \r\n (\\eg{} Camel Case) are not used or strictly followed and--or when \r\n these words have been abbreviated or otherwise transformed. This \r\n paper proposes a technique inspired from speech recognition, \\ie{} \r\n dynamic time warping, to split identifiers into component words. The \r\n proposed technique has been applied to identifiers extracted from two \r\n different applications: JHotDraw and Lynx. Results compared to \r\n manually-built oracles and with Camel Case algorithm are encouraging. \r\n In fact, they show that the technique successfully recognize words \r\n composing identifiers (even when abbreviated) in about 90\\% of cases \r\n and that it performs better than Camel Case. Furthermore, it was able \r\n to spot mistakes in the manually-built oracle.}\r\n}\r\n\r\n","author_short":["Madani, N.","Guerrouj, L.","Di Penta, M.","Gu�h�neuc, Y.","Antoniol, G."],"editor_short":["Ferenc, R.","Due�as, J. C."],"key":"Madani10-CSMR-IdentifiersSpeechRecognition","id":"Madani10-CSMR-IdentifiersSpeechRecognition","bibbaseid":"madani-guerrouj-dipenta-guhneuc-antoniol-recognizingwordsfromsourcecodeidentifiersusingspeechrecognitiontechniques-2010","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/CSMR10c.doc.pdf"},"keyword":["Topic: Identifier analysis","Venue: <c>CSMR</c>"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"http://www.yann-gael.gueheneuc.net/"}}},"search_terms":["recognizing","words","source","code","identifiers","using","speech","recognition","techniques","madani","guerrouj","di penta","gu�h�neuc","antoniol"],"keywords":["topic: identifier analysis","venue: <c>csmr</c>"],"authorIDs":["AfJhKcg96muyPdu7S","xkviMnkrGBneANvMr"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}