TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques. Guerrouj, L., Di Penta, M., Antoniol, G., & Gu�h�neuc, Y. Journal of Software Maintenance and Evolution: Research and Practice (JSME), 25(6):575–599, Wiley, June, 2011. 24 pages.
Paper abstract bibtex The software engineering literature reports empirical evidence on the relation between various characteristics of a software system and software quality. Among many factors, recent studies have shown that a proper choice of identifiers influences software understandability and maintainability. Indeed, identifiers are developers' main source of information and guide their cognitive processes during program understanding when high-level documentation is scarce or outdated and when source code is not sufficiently commented. This paper proposes a novel approach to recognize words composing source code identifiers. The approach is based on an adaptation of Dynamic Time Warping used to recognize words in continuous speech. The approach overcomes the limitations of existing identifier splitting approaches when naming conventions (\eg Camel Case) are not used or when identifiers contain abbreviations. The proposed approach has been applied on a sample of more than 1,000 identifiers extracted from 340 C programs and compared with a simple Camel Case splitter and with an implementation of an alternative identifier splitting approach, Samurai. Results indicate the capability of the novel approach (i) to outperform the alternative ones when a dictionary augmented with domain knowledge or a contextual dictionary are used and (ii) to expand 48\NOof a set of selected abbreviations into dictionary words.
@ARTICLE{Guerrouj11-JSME-TIDIER,
AUTHOR = {Latifa Guerrouj and Di Penta, Massimiliano and
Giuliano Antoniol and Yann-Ga�l Gu�h�neuc},
JOURNAL = {Journal of Software Maintenance and Evolution: Research and Practice (JSME)},
TITLE = {TIDIER: An Identifier Splitting Approach using Speech
Recognition Techniques},
YEAR = {2011},
MONTH = {June},
NOTE = {24 pages.},
NUMBER = {6},
PAGES = {575--599},
VOLUME = {25},
EDITOR = {Rudolf Ferenc and Juan Carlos Due�as},
KEYWORDS = {Topic: <b>Identifier analysis</b>, Venue: <b>JSME</b>,
Venue: <b>JSEP</b>},
PUBLISHER = {Wiley},
URL = {http://www.ptidej.net/publications/documents/JSME11.doc.pdf},
ABSTRACT = {The software engineering literature reports empirical
evidence on the relation between various characteristics of a
software system and software quality. Among many factors, recent
studies have shown that a proper choice of identifiers influences
software understandability and maintainability. Indeed, identifiers
are developers' main source of information and guide their cognitive
processes during program understanding when high-level documentation
is scarce or outdated and when source code is not sufficiently
commented. This paper proposes a novel approach to recognize words
composing source code identifiers. The approach is based on an
adaptation of Dynamic Time Warping used to recognize words in
continuous speech. The approach overcomes the limitations of existing
identifier splitting approaches when naming conventions (\eg{} Camel
Case) are not used or when identifiers contain abbreviations. The
proposed approach has been applied on a sample of more than 1,000
identifiers extracted from 340 C programs and compared with a simple
Camel Case splitter and with an implementation of an alternative
identifier splitting approach, Samurai. Results indicate the
capability of the novel approach (i) to outperform the alternative
ones when a dictionary augmented with domain knowledge or a
contextual dictionary are used and (ii) to expand 48\NOof a set of
selected abbreviations into dictionary words.}
}
Downloads: 0
{"_id":"wAf6pwpWtqwGpinvd","bibbaseid":"guerrouj-dipenta-antoniol-guhneuc-tidieranidentifiersplittingapproachusingspeechrecognitiontechniques-2011","downloads":0,"creationDate":"2018-01-17T20:29:42.410Z","title":"TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques","author_short":["Guerrouj, L.","Di Penta, M.","Antoniol, G.","Gu�h�neuc, Y."],"year":2011,"bibtype":"article","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["Latifa"],"propositions":[],"lastnames":["Guerrouj"],"suffixes":[]},{"propositions":[],"lastnames":["Di","Penta"],"firstnames":["Massimiliano"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]}],"journal":"Journal of Software Maintenance and Evolution: Research and Practice (JSME)","title":"TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques","year":"2011","month":"June","note":"24 pages.","number":"6","pages":"575–599","volume":"25","editor":[{"firstnames":["Rudolf"],"propositions":[],"lastnames":["Ferenc"],"suffixes":[]},{"firstnames":["Juan","Carlos"],"propositions":[],"lastnames":["Due�as"],"suffixes":[]}],"keywords":"Topic: <b>Identifier analysis</b>, Venue: <b>JSME</b>, Venue: <b>JSEP</b>","publisher":"Wiley","url":"http://www.ptidej.net/publications/documents/JSME11.doc.pdf","abstract":"The software engineering literature reports empirical evidence on the relation between various characteristics of a software system and software quality. Among many factors, recent studies have shown that a proper choice of identifiers influences software understandability and maintainability. Indeed, identifiers are developers' main source of information and guide their cognitive processes during program understanding when high-level documentation is scarce or outdated and when source code is not sufficiently commented. This paper proposes a novel approach to recognize words composing source code identifiers. The approach is based on an adaptation of Dynamic Time Warping used to recognize words in continuous speech. The approach overcomes the limitations of existing identifier splitting approaches when naming conventions (\\eg Camel Case) are not used or when identifiers contain abbreviations. The proposed approach has been applied on a sample of more than 1,000 identifiers extracted from 340 C programs and compared with a simple Camel Case splitter and with an implementation of an alternative identifier splitting approach, Samurai. Results indicate the capability of the novel approach (i) to outperform the alternative ones when a dictionary augmented with domain knowledge or a contextual dictionary are used and (ii) to expand 48\\NOof a set of selected abbreviations into dictionary words.","bibtex":"@ARTICLE{Guerrouj11-JSME-TIDIER,\r\n AUTHOR = {Latifa Guerrouj and Di Penta, Massimiliano and \r\n Giuliano Antoniol and Yann-Ga�l Gu�h�neuc},\r\n JOURNAL = {Journal of Software Maintenance and Evolution: Research and Practice (JSME)},\r\n TITLE = {TIDIER: An Identifier Splitting Approach using Speech \r\n Recognition Techniques},\r\n YEAR = {2011},\r\n MONTH = {June},\r\n NOTE = {24 pages.},\r\n NUMBER = {6},\r\n PAGES = {575--599},\r\n VOLUME = {25},\r\n EDITOR = {Rudolf Ferenc and Juan Carlos Due�as},\r\n KEYWORDS = {Topic: <b>Identifier analysis</b>, Venue: <b>JSME</b>, \r\n Venue: <b>JSEP</b>},\r\n PUBLISHER = {Wiley},\r\n URL = {http://www.ptidej.net/publications/documents/JSME11.doc.pdf},\r\n ABSTRACT = {The software engineering literature reports empirical \r\n evidence on the relation between various characteristics of a \r\n software system and software quality. Among many factors, recent \r\n studies have shown that a proper choice of identifiers influences \r\n software understandability and maintainability. Indeed, identifiers \r\n are developers' main source of information and guide their cognitive \r\n processes during program understanding when high-level documentation \r\n is scarce or outdated and when source code is not sufficiently \r\n commented. This paper proposes a novel approach to recognize words \r\n composing source code identifiers. The approach is based on an \r\n adaptation of Dynamic Time Warping used to recognize words in \r\n continuous speech. The approach overcomes the limitations of existing \r\n identifier splitting approaches when naming conventions (\\eg{} Camel \r\n Case) are not used or when identifiers contain abbreviations. The \r\n proposed approach has been applied on a sample of more than 1,000 \r\n identifiers extracted from 340 C programs and compared with a simple \r\n Camel Case splitter and with an implementation of an alternative \r\n identifier splitting approach, Samurai. Results indicate the \r\n capability of the novel approach (i) to outperform the alternative \r\n ones when a dictionary augmented with domain knowledge or a \r\n contextual dictionary are used and (ii) to expand 48\\NOof a set of \r\n selected abbreviations into dictionary words.}\r\n}\r\n\r\n","author_short":["Guerrouj, L.","Di Penta, M.","Antoniol, G.","Gu�h�neuc, Y."],"editor_short":["Ferenc, R.","Due�as, J. C."],"key":"Guerrouj11-JSME-TIDIER","id":"Guerrouj11-JSME-TIDIER","bibbaseid":"guerrouj-dipenta-antoniol-guhneuc-tidieranidentifiersplittingapproachusingspeechrecognitiontechniques-2011","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/JSME11.doc.pdf"},"keyword":["Topic: <b>Identifier analysis</b>","Venue: <b>JSME</b>","Venue: <b>JSEP</b>"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"https://bibbase.org/show?bib=http://www.yann-gael.gueheneuc.net/Work/BibBase/guehene%20(automatically%20cleaned).bib"}},"downloads":0},"search_terms":["tidier","identifier","splitting","approach","using","speech","recognition","techniques","guerrouj","di penta","antoniol","gu�h�neuc"],"keywords":["topic: <b>identifier analysis</b>","venue: <b>jsme</b>","venue: <b>jsep</b>"],"authorIDs":["2tFXMaTSHJKEB5ebi","2wY5eBcsYmbPNfmMS","36dm7jaw5EK5Wrr4D","3NxaNKic3nkXi568L","3S5Dkpx7DNefzJrnf","3afmfmoPr4SHa8B5F","3wmHB7JoQbQz2ujun","4YBWWbao6RKgiyGJE","4jZj9tB4SJ8zEEgHk","5CvA2hsaib2bPMaef","5TFJbxqRDGFj2P8Rg","5a5fb236a39f2c3645000032","5a8f17e006df23bc34000020","5cx79LBmaWcihgM4J","5de9a6425b51bcde01000042","5dee1197584fb4df010000fc","5df228a41e4fe9df0100012c","5df617f72b34d0de0100008b","5dfa14782e791dde010000ea","5dfe3d5e68d95dde01000080","5e02525b6ffa15df0100009f","5e0662c07da1d1de0100021a","5e093e8b934cacdf0100008b","5e0a61673eccf6e001000016","5e0b75b7e73cd6de010000f9","5e0d4ca6ae5827df0100007f","5e0ddf08552b25df01000137","5e0e5c41ac7d11df010000a3","5e1268e7a4cabfdf0100002c","5e12c45a70e2c4f201000043","5e157809f1f31adf01000006","5e162ca1df1bb4de01000123","5e185cff809b84f201000091","5e1a6c39b16ec5df0100000f","5e21b27e96aea7de01000084","5e22c57e49e2b4df0100000f","5e23c2aeb93b51de01000030","5e245835079bb2df0100007d","5e24fa3e2e79a1f201000027","5e26252f408641df01000161","5e26bfbd8535cedf0100005c","5e280fd1f860fcde0100006a","5e2a827f881468de01000080","5e2eb321b84405df01000128","5e2ef635e374eede0100001a","5e2fd6a74e91a9df01000010","5e3266bb5633c9de01000068","5e32ab0ee17accde0100012a","5e32bdec466076df010000d9","5e32d603150c84df01000068","5e34fb145978bef2010000a6","5e36bc8e7b975dde0100009a","5e389940030bcadf010001b4","5e39dd9a3687dddf010000a4","5e3ad173f2a00cdf01000206","5e3dcd50d51253de0100003d","5e3e8713666d79df010000a6","5e3ed80986a596de010000b9","5e3fefe1add5fbde01000087","5e409c79d668c6de010000c7","5e41795ed9f47bee01000194","5e41cd5be7c67ade010000eb","5e42ef1ca6f4a6f2010001eb","5e46dcb342fb31df01000113","5e46f12c461d04f201000078","5e478c9e27a0c8de010000ef","5e47fb06385298df010000b2","5e4add1941072bdf01000011","5e4c1c792dc400de0100011a","5e4c6262271596df010001b9","5e4f0360338acfde01000156","5e4f11b0e5389bde0100007e","5e530b976d68b8df010000a5","5e54ad6d929495df0100007c","5e57161b429006de0100005a","5e57839fcef9b7de0100003c","5e580f5a6a456fde0100004f","5e5afa78038583de010000f7","5e5b477174a3e7df010000b7","5e5d370173eb2edf01000038","5e5fca336b32b0f20100011b","5e60e7f0839e59df010000e8","5e6377cfae1c4dde0100011e","5e657007de41b9df0100017a","5e676f0910be53de0100001a","5gPbX6aQJFjpv2Na9","6eE2yRdMDQr2WGXuA","6iHE5tuM7yTfLd2pA","7BPWyvMr5e6bzbk7T","7RFwhpGkpZRsLwnmB","7amRA4ALcR2mksheF","7mkQL8eiftj5bGMzB","8jPjKehCMsj7ncvxN","8peLXfWtCSic5n7oz","95eRgTcabnJwF46f3","9Ba9JxkjQBCeGBZKg","9DjgvzQrx27uxbyJj","9HD56d3k5yrB9H9oq","9RtPuXNyeS3k8LM9J","9diLYpd8cMmjBh54T","9nx6Yv3XREwJDyRms","AfJhKcg96muyPdu7S","BGvchZsjW7Wejj9Cz","BYwdHpGr6xT5vmE5C","Bah6LM7GXdXTy8GGA","BmH2ytt7sXwPHcrse","CqJYxtqe6qBbtd5yz","D4kEZ2JcWCoMvRPy7","DFWW7D6Y7X57n4cbM","DSorPqHDfrFiNM5Ew","DWXisKXaQArvre3QL","DwBm6isMpKSHHkhAd","E88raoktD8ANF92Yu","EAjLox7ycbofcCXce","F8rzFhY9yWA7pBX4j","G3iynDKjz9BHJbrdg","GJw6mQETXADSCZuuk","GWK5669HLqPyYMQ5J","GibAXjj4xXdFT8qWh","HzFZpgGcfabjAp9x6","KJ4eYziy6hanF9kr9","Kcyu7uncEFiYzYP2D","N4zzhqcywSzDDYsdh","NCDg3xE2mPcNAu7LX","NvgbTAz3hZ9SevZvd","QbcDS3wK43sRASvgu","S3b7Bb9wwfpByQgbo","SXJaeFCgBDJ5HAHtj","T5nL8TGrggoLAF8Dj","W9vT8YcCNFEcp9mWQ","WZ5CpBEFNsb2ivfah","XxviSwRxhwgNwsraH","Z2Zs662GpXqKBEAMc","ZKYFgjHGm7PE4Y2kv","a5qpGirN3B5BLKdMh","ahGA65oGDChNYp7Mb","bA7pGCMS9AB2RBo2p","bTQb3TcrbBShtqFPS","cYnqisf4wzBsM7MF5","cjHpaYiWD5eX7btH4","ckrbesqi3pWqfF2nP","dH8EsWHZtCFuQk5bq","dS5kvBMnk3LMQe56w","eXsFRMzE7WfbHbBL4","fmmsBu4m6ayKtuopf","hdXr3PD8cHNWyAdCe","hgZxckC87u2A57teF","juvCjffHJaPQf44im","keQBT2Apb9yaev8AH","myHdF8zARwW5uGmFs","nJLfaznnYgFqWQQrv","onghitNWSvN2FpCaN","osgPwDW2y5KDXRa2i","pAWFMDHu5dNixqPAq","pLvmgrCjMeDYJiJxB","q4azvWakEjp2TQM7S","qBee6Md9YwRKwkeW3","qQky2Csek4mroLn2P","tJz4YBCqAzZAzek5d","tLtjttw8dEqF6YQ4s","uQ6jCrPijzAmZyfXz","vGEaFNt7mm92Z7GXc","vRkMmE65HSFpCk6FW","vsEsf8FR3Fxb6z7fJ","x5ejzvDeXCc89Dukv","xEQyC5shxpYySSJJm","xhwDdvQ7MYxa6keXm","xkviMnkrGBneANvMr","y64rFMcyp7tDsBrJQ","yBYJWSShoKkMG8aPE","yQPghCwQv22kf6dFq","yd5sCxaEiu5vWizTq"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}