TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques

TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques. Guerrouj, L., Di Penta, M., Antoniol, G., & Gu�h�neuc, Y. Journal of Software Maintenance and Evolution: Research and Practice (JSME), 25(6):575–599, Wiley, June, 2011. 24 pages.

Paper abstract bibtex

The software engineering literature reports empirical evidence on the relation between various characteristics of a software system and software quality. Among many factors, recent studies have shown that a proper choice of identifiers influences software understandability and maintainability. Indeed, identifiers are developers' main source of information and guide their cognitive processes during program understanding when high-level documentation is scarce or outdated and when source code is not sufficiently commented. This paper proposes a novel approach to recognize words composing source code identifiers. The approach is based on an adaptation of Dynamic Time Warping used to recognize words in continuous speech. The approach overcomes the limitations of existing identifier splitting approaches when naming conventions (e.g., Camel Case) are not used or when identifiers contain abbreviations. The proposed approach has been applied on a sample of more than 1,000 identifiers extracted from 340 C programs and compared with a simple Camel Case splitter and with an implementation of an alternative identifier splitting approach, Samurai. Results indicate the capability of the novel approach (i) to outperform the alternative ones when a dictionary augmented with domain knowledge or a contextual dictionary are used and (ii) to expand 48% of a set of selected abbreviations into dictionary words.

@ARTICLE{Guerrouj11-JSME-TIDIER,
   AUTHOR       = {Latifa Guerrouj and Di Penta, Massimiliano and 
      Giuliano Antoniol and Yann-Ga�l Gu�h�neuc},
   JOURNAL      = {Journal of Software Maintenance and Evolution: Research and Practice (JSME)},
   TITLE        = {TIDIER: An Identifier Splitting Approach using Speech 
      Recognition Techniques},
   YEAR         = {2011},
   MONTH        = {June},
   NOTE         = {24 pages.},
   NUMBER       = {6},
   PAGES        = {575&ndash;599},
   VOLUME       = {25},
   EDITOR       = {Rudolf Ferenc and Juan Carlos Due�as},
   KEYWORDS     = {Topic: <b>Identifier analyses</b>, 
      Rubrique : <b>analyses des identifiants</b>, Journal: <b>JSME</b>, 
      Journal: <b>JSEP</b>},
   PUBLISHER    = {Wiley},
   URL          = {http://www.ptidej.net/publications/documents/JSME11.doc.pdf},
   ABSTRACT     = {The software engineering literature reports empirical 
      evidence on the relation between various characteristics of a 
      software system and software quality. Among many factors, recent 
      studies have shown that a proper choice of identifiers influences 
      software understandability and maintainability. Indeed, identifiers 
      are developers' main source of information and guide their cognitive 
      processes during program understanding when high-level documentation 
      is scarce or outdated and when source code is not sufficiently 
      commented. This paper proposes a novel approach to recognize words 
      composing source code identifiers. The approach is based on an 
      adaptation of Dynamic Time Warping used to recognize words in 
      continuous speech. The approach overcomes the limitations of existing 
      identifier splitting approaches when naming conventions (e.g., Camel 
      Case) are not used or when identifiers contain abbreviations. The 
      proposed approach has been applied on a sample of more than 1,000 
      identifiers extracted from 340 C programs and compared with a simple 
      Camel Case splitter and with an implementation of an alternative 
      identifier splitting approach, Samurai. Results indicate the 
      capability of the novel approach (i) to outperform the alternative 
      ones when a dictionary augmented with domain knowledge or a 
      contextual dictionary are used and (ii) to expand 48% of a set of 
      selected abbreviations into dictionary words.}
}

Downloads: 0

{"_id":"wAf6pwpWtqwGpinvd","bibbaseid":"guerrouj-dipenta-antoniol-guhneuc-tidieranidentifiersplittingapproachusingspeechrecognitiontechniques-2011","downloads":0,"creationDate":"2018-01-17T20:29:42.410Z","title":"TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques","author_short":["Guerrouj, L.","Di Penta, M.","Antoniol, G.","Gu�h�neuc, Y."],"year":2011,"bibtype":"article","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["Latifa"],"propositions":[],"lastnames":["Guerrouj"],"suffixes":[]},{"propositions":[],"lastnames":["Di","Penta"],"firstnames":["Massimiliano"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]}],"journal":"Journal of Software Maintenance and Evolution: Research and Practice (JSME)","title":"TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques","year":"2011","month":"June","note":"24 pages.","number":"6","pages":"575–599","volume":"25","editor":[{"firstnames":["Rudolf"],"propositions":[],"lastnames":["Ferenc"],"suffixes":[]},{"firstnames":["Juan","Carlos"],"propositions":[],"lastnames":["Due�as"],"suffixes":[]}],"keywords":"Topic: Identifier analyses, Rubrique : analyses des identifiants, Journal: JSME, Journal: JSEP","publisher":"Wiley","url":"http://www.ptidej.net/publications/documents/JSME11.doc.pdf","abstract":"The software engineering literature reports empirical evidence on the relation between various characteristics of a software system and software quality. Among many factors, recent studies have shown that a proper choice of identifiers influences software understandability and maintainability. Indeed, identifiers are developers' main source of information and guide their cognitive processes during program understanding when high-level documentation is scarce or outdated and when source code is not sufficiently commented. This paper proposes a novel approach to recognize words composing source code identifiers. The approach is based on an adaptation of Dynamic Time Warping used to recognize words in continuous speech. The approach overcomes the limitations of existing identifier splitting approaches when naming conventions (e.g., Camel Case) are not used or when identifiers contain abbreviations. The proposed approach has been applied on a sample of more than 1,000 identifiers extracted from 340 C programs and compared with a simple Camel Case splitter and with an implementation of an alternative identifier splitting approach, Samurai. Results indicate the capability of the novel approach (i) to outperform the alternative ones when a dictionary augmented with domain knowledge or a contextual dictionary are used and (ii) to expand 48% of a set of selected abbreviations into dictionary words.","bibtex":"@ARTICLE{Guerrouj11-JSME-TIDIER,\r\n AUTHOR = {Latifa Guerrouj and Di Penta, Massimiliano and \r\n Giuliano Antoniol and Yann-Ga�l Gu�h�neuc},\r\n JOURNAL = {Journal of Software Maintenance and Evolution: Research and Practice (JSME)},\r\n TITLE = {TIDIER: An Identifier Splitting Approach using Speech \r\n Recognition Techniques},\r\n YEAR = {2011},\r\n MONTH = {June},\r\n NOTE = {24 pages.},\r\n NUMBER = {6},\r\n PAGES = {575–599},\r\n VOLUME = {25},\r\n EDITOR = {Rudolf Ferenc and Juan Carlos Due�as},\r\n KEYWORDS = {Topic: Identifier analyses, \r\n Rubrique : analyses des identifiants, Journal: JSME, \r\n Journal: JSEP},\r\n PUBLISHER = {Wiley},\r\n URL = {http://www.ptidej.net/publications/documents/JSME11.doc.pdf},\r\n ABSTRACT = {The software engineering literature reports empirical \r\n evidence on the relation between various characteristics of a \r\n software system and software quality. Among many factors, recent \r\n studies have shown that a proper choice of identifiers influences \r\n software understandability and maintainability. Indeed, identifiers \r\n are developers' main source of information and guide their cognitive \r\n processes during program understanding when high-level documentation \r\n is scarce or outdated and when source code is not sufficiently \r\n commented. This paper proposes a novel approach to recognize words \r\n composing source code identifiers. The approach is based on an \r\n adaptation of Dynamic Time Warping used to recognize words in \r\n continuous speech. The approach overcomes the limitations of existing \r\n identifier splitting approaches when naming conventions (e.g., Camel \r\n Case) are not used or when identifiers contain abbreviations. The \r\n proposed approach has been applied on a sample of more than 1,000 \r\n identifiers extracted from 340 C programs and compared with a simple \r\n Camel Case splitter and with an implementation of an alternative \r\n identifier splitting approach, Samurai. Results indicate the \r\n capability of the novel approach (i) to outperform the alternative \r\n ones when a dictionary augmented with domain knowledge or a \r\n contextual dictionary are used and (ii) to expand 48% of a set of \r\n selected abbreviations into dictionary words.}\r\n}\r\n\r\n","author_short":["Guerrouj, L.","Di Penta, M.","Antoniol, G.","Gu�h�neuc, Y."],"editor_short":["Ferenc, R.","Due�as, J. C."],"key":"Guerrouj11-JSME-TIDIER","id":"Guerrouj11-JSME-TIDIER","bibbaseid":"guerrouj-dipenta-antoniol-guhneuc-tidieranidentifiersplittingapproachusingspeechrecognitiontechniques-2011","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/JSME11.doc.pdf"},"keyword":["Topic: Identifier analyses","Rubrique : analyses des identifiants","Journal: JSME","Journal: JSEP"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"https://bibbase.org/show?bib=http://www.yann-gael.gueheneuc.net/Work/BibBase/guehene%20(automatically%20cleaned).bib"}},"downloads":0},"search_terms":["tidier","identifier","splitting","approach","using","speech","recognition","techniques","guerrouj","di penta","antoniol","gu�h�neuc"],"keywords":["topic: identifier analyses","rubrique : analyses des identifiants","journal: jsme","journal: jsep"],"authorIDs":["2tFXMaTSHJKEB5ebi","2wY5eBcsYmbPNfmMS","36dm7jaw5EK5Wrr4D","3NxaNKic3nkXi568L","3S5Dkpx7DNefzJrnf","3afmfmoPr4SHa8B5F","3wmHB7JoQbQz2ujun","4YBWWbao6RKgiyGJE","4jZj9tB4SJ8zEEgHk","5CvA2hsaib2bPMaef","5TFJbxqRDGFj2P8Rg","5a5fb236a39f2c3645000032","5a8f17e006df23bc34000020","5cx79LBmaWcihgM4J","5de9a6425b51bcde01000042","5dee1197584fb4df010000fc","5df228a41e4fe9df0100012c","5df617f72b34d0de0100008b","5dfa14782e791dde010000ea","5dfe3d5e68d95dde01000080","5e02525b6ffa15df0100009f","5e0662c07da1d1de0100021a","5e093e8b934cacdf0100008b","5e0a61673eccf6e001000016","5e0b75b7e73cd6de010000f9","5e0d4ca6ae5827df0100007f","5e0ddf08552b25df01000137","5e0e5c41ac7d11df010000a3","5e1268e7a4cabfdf0100002c","5e12c45a70e2c4f201000043","5e157809f1f31adf01000006","5e162ca1df1bb4de01000123","5e185cff809b84f201000091","5e1a6c39b16ec5df0100000f","5e21b27e96aea7de01000084","5e22c57e49e2b4df0100000f","5e23c2aeb93b51de01000030","5e245835079bb2df0100007d","5e24fa3e2e79a1f201000027","5e26252f408641df01000161","5e26bfbd8535cedf0100005c","5e280fd1f860fcde0100006a","5e2a827f881468de01000080","5e2eb321b84405df01000128","5e2ef635e374eede0100001a","5e2fd6a74e91a9df01000010","5e3266bb5633c9de01000068","5e32ab0ee17accde0100012a","5e32bdec466076df010000d9","5e32d603150c84df01000068","5e34fb145978bef2010000a6","5e36bc8e7b975dde0100009a","5e389940030bcadf010001b4","5e39dd9a3687dddf010000a4","5e3ad173f2a00cdf01000206","5e3dcd50d51253de0100003d","5e3e8713666d79df010000a6","5e3ed80986a596de010000b9","5e3fefe1add5fbde01000087","5e409c79d668c6de010000c7","5e41795ed9f47bee01000194","5e41cd5be7c67ade010000eb","5e42ef1ca6f4a6f2010001eb","5e46dcb342fb31df01000113","5e46f12c461d04f201000078","5e478c9e27a0c8de010000ef","5e47fb06385298df010000b2","5e4add1941072bdf01000011","5e4c1c792dc400de0100011a","5e4c6262271596df010001b9","5e4f0360338acfde01000156","5e4f11b0e5389bde0100007e","5e530b976d68b8df010000a5","5e54ad6d929495df0100007c","5e57161b429006de0100005a","5e57839fcef9b7de0100003c","5e580f5a6a456fde0100004f","5e5afa78038583de010000f7","5e5b477174a3e7df010000b7","5e5d370173eb2edf01000038","5e5fca336b32b0f20100011b","5e60e7f0839e59df010000e8","5e6377cfae1c4dde0100011e","5e657007de41b9df0100017a","5e676f0910be53de0100001a","5gPbX6aQJFjpv2Na9","6eE2yRdMDQr2WGXuA","6iHE5tuM7yTfLd2pA","7BPWyvMr5e6bzbk7T","7RFwhpGkpZRsLwnmB","7amRA4ALcR2mksheF","7mkQL8eiftj5bGMzB","8jPjKehCMsj7ncvxN","8peLXfWtCSic5n7oz","95eRgTcabnJwF46f3","9Ba9JxkjQBCeGBZKg","9DjgvzQrx27uxbyJj","9HD56d3k5yrB9H9oq","9RtPuXNyeS3k8LM9J","9diLYpd8cMmjBh54T","9nx6Yv3XREwJDyRms","AfJhKcg96muyPdu7S","BGvchZsjW7Wejj9Cz","BYwdHpGr6xT5vmE5C","Bah6LM7GXdXTy8GGA","BmH2ytt7sXwPHcrse","CqJYxtqe6qBbtd5yz","D4kEZ2JcWCoMvRPy7","DFWW7D6Y7X57n4cbM","DSorPqHDfrFiNM5Ew","DWXisKXaQArvre3QL","DwBm6isMpKSHHkhAd","E88raoktD8ANF92Yu","EAjLox7ycbofcCXce","F8rzFhY9yWA7pBX4j","G3iynDKjz9BHJbrdg","GJw6mQETXADSCZuuk","GWK5669HLqPyYMQ5J","GibAXjj4xXdFT8qWh","HzFZpgGcfabjAp9x6","KJ4eYziy6hanF9kr9","Kcyu7uncEFiYzYP2D","N4zzhqcywSzDDYsdh","NCDg3xE2mPcNAu7LX","NvgbTAz3hZ9SevZvd","QbcDS3wK43sRASvgu","S3b7Bb9wwfpByQgbo","SXJaeFCgBDJ5HAHtj","T5nL8TGrggoLAF8Dj","W9vT8YcCNFEcp9mWQ","WZ5CpBEFNsb2ivfah","XxviSwRxhwgNwsraH","Z2Zs662GpXqKBEAMc","ZKYFgjHGm7PE4Y2kv","a5qpGirN3B5BLKdMh","ahGA65oGDChNYp7Mb","bA7pGCMS9AB2RBo2p","bTQb3TcrbBShtqFPS","cYnqisf4wzBsM7MF5","cjHpaYiWD5eX7btH4","ckrbesqi3pWqfF2nP","dH8EsWHZtCFuQk5bq","dS5kvBMnk3LMQe56w","eXsFRMzE7WfbHbBL4","fmmsBu4m6ayKtuopf","hdXr3PD8cHNWyAdCe","hgZxckC87u2A57teF","juvCjffHJaPQf44im","keQBT2Apb9yaev8AH","myHdF8zARwW5uGmFs","nJLfaznnYgFqWQQrv","onghitNWSvN2FpCaN","osgPwDW2y5KDXRa2i","pAWFMDHu5dNixqPAq","pLvmgrCjMeDYJiJxB","q4azvWakEjp2TQM7S","qBee6Md9YwRKwkeW3","qQky2Csek4mroLn2P","tJz4YBCqAzZAzek5d","tLtjttw8dEqF6YQ4s","uQ6jCrPijzAmZyfXz","vGEaFNt7mm92Z7GXc","vRkMmE65HSFpCk6FW","vsEsf8FR3Fxb6z7fJ","x5ejzvDeXCc89Dukv","xEQyC5shxpYySSJJm","xhwDdvQ7MYxa6keXm","xkviMnkrGBneANvMr","y64rFMcyp7tDsBrJQ","yBYJWSShoKkMG8aPE","yQPghCwQv22kf6dFq","yd5sCxaEiu5vWizTq"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}