An Experimental Investigation on the Effects of Context on Source Code Identifiers Splitting and Expansion. Guerrouj, L., Di Penta, M., Gu�h�neuc, Y., & Antoniol, G. Empirical Software Engineering (EMSE), 19(6):1–48, Springer, July, 2013. 45 pages.
Paper abstract bibtex Recent and past studies indicate that source code lexicon plays an important role in program comprehension. Developers often compose source code identifiers with abbreviated words and acronyms, and do not always use consistent mechanisms and explicit separators when creating identifiers. Such choices and inconsistencies impede the work of developers that must understand identifiers by decomposing them into their component terms, and mapping them onto dictionary, application or domain words. When software documentation is scarce, outdated or simply not available, developers must therefore use the available contextual information to understand the source code. This paper aims at investigating how developers split and expand source code identifiers, and, specifically, the extent to which different kinds of contextual information could support such a task. In particular, we consider (i) an internal context consisting of the content of functions and source code files in which the identifiers are located, and (ii) an external context involving external documentation. We conducted a family of two experiments with 63 participants, including bachelor, master, Ph.D. students, and post-docs. We randomly sampled a set of 50 identifiers from a corpus of open source C programs and we asked participants to split and expand them with the availability (or not) of internal and external contexts. We report evidence on the usefulness of contextual information for identifier splitting and acronym/abbreviation expansion. We observe that the source code files are more helpful than just looking at function source code, and that the application-level contextual information does not help any further. The availability of external sources of information only helps in some circumstances. Also, in some cases, we observe that participants better expanded acronyms than abbreviations, although in most cases both exhibit the same level of accuracy. Finally, results indicated that the knowledge of English plays a significant effect in identifier splitting/expansion. The obtained results confirm the conjecture that contextual information is useful in program comprehension, including when developers split and expand identifiers to understand them. We hypothesize that the integration of identifier splitting and expansion tools with IDE could help to improve developers' productivity.
@ARTICLE{Guerrouj13-EMSE-TIDIER,
AUTHOR = {Latifa Guerrouj and Di Penta, Massimiliano and
Yann-Ga�l Gu�h�neuc and Giuliano Antoniol},
JOURNAL = {Empirical Software Engineering (EMSE)},
TITLE = {An Experimental Investigation on the Effects of Context
on Source Code Identifiers Splitting and Expansion},
YEAR = {2013},
MONTH = {July},
NOTE = {45 pages.},
NUMBER = {6},
PAGES = {1--48},
VOLUME = {19},
EDITOR = {Victor R. Basili and Lionel C. Briand},
KEYWORDS = {Topic: <b>Identifier analysis</b>, Venue: <b>EMSE</b>},
PUBLISHER = {Springer},
URL = {http://www.ptidej.net/publications/documents/EMSE13.doc.pdf},
ABSTRACT = {Recent and past studies indicate that source code
lexicon plays an important role in program comprehension. Developers
often compose source code identifiers with abbreviated words and
acronyms, and do not always use consistent mechanisms and explicit
separators when creating identifiers. Such choices and
inconsistencies impede the work of developers that must understand
identifiers by decomposing them into their component terms, and
mapping them onto dictionary, application or domain words. When
software documentation is scarce, outdated or simply not available,
developers must therefore use the available contextual information to
understand the source code. This paper aims at investigating how
developers split and expand source code identifiers, and,
specifically, the extent to which different kinds of contextual
information could support such a task. In particular, we consider (i)
an internal context consisting of the content of functions and source
code files in which the identifiers are located, and (ii) an external
context involving external documentation. We conducted a family of
two experiments with 63 participants, including bachelor, master,
Ph.D. students, and post-docs. We randomly sampled a set of 50
identifiers from a corpus of open source C programs and we asked
participants to split and expand them with the availability (or not)
of internal and external contexts. We report evidence on the
usefulness of contextual information for identifier splitting and
acronym/abbreviation expansion. We observe that the source code files
are more helpful than just looking at function source code, and that
the application-level contextual information does not help any
further. The availability of external sources of information only
helps in some circumstances. Also, in some cases, we observe that
participants better expanded acronyms than abbreviations, although in
most cases both exhibit the same level of accuracy. Finally, results
indicated that the knowledge of English plays a significant effect in
identifier splitting/expansion. The obtained results confirm the
conjecture that contextual information is useful in program
comprehension, including when developers split and expand identifiers
to understand them. We hypothesize that the integration of identifier
splitting and expansion tools with IDE could help to improve
developers' productivity.}
}
Downloads: 0
{"_id":"dquRXJqQcvJSnmimT","bibbaseid":"guerrouj-dipenta-guhneuc-antoniol-anexperimentalinvestigationontheeffectsofcontextonsourcecodeidentifierssplittingandexpansion-2013","downloads":0,"creationDate":"2018-01-17T20:29:42.299Z","title":"An Experimental Investigation on the Effects of Context on Source Code Identifiers Splitting and Expansion","author_short":["Guerrouj, L.","Di Penta, M.","Gu�h�neuc, Y.","Antoniol, G."],"year":2013,"bibtype":"article","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["Latifa"],"propositions":[],"lastnames":["Guerrouj"],"suffixes":[]},{"propositions":[],"lastnames":["Di","Penta"],"firstnames":["Massimiliano"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]}],"journal":"Empirical Software Engineering (EMSE)","title":"An Experimental Investigation on the Effects of Context on Source Code Identifiers Splitting and Expansion","year":"2013","month":"July","note":"45 pages.","number":"6","pages":"1–48","volume":"19","editor":[{"firstnames":["Victor","R."],"propositions":[],"lastnames":["Basili"],"suffixes":[]},{"firstnames":["Lionel","C."],"propositions":[],"lastnames":["Briand"],"suffixes":[]}],"keywords":"Topic: <b>Identifier analysis</b>, Venue: <b>EMSE</b>","publisher":"Springer","url":"http://www.ptidej.net/publications/documents/EMSE13.doc.pdf","abstract":"Recent and past studies indicate that source code lexicon plays an important role in program comprehension. Developers often compose source code identifiers with abbreviated words and acronyms, and do not always use consistent mechanisms and explicit separators when creating identifiers. Such choices and inconsistencies impede the work of developers that must understand identifiers by decomposing them into their component terms, and mapping them onto dictionary, application or domain words. When software documentation is scarce, outdated or simply not available, developers must therefore use the available contextual information to understand the source code. This paper aims at investigating how developers split and expand source code identifiers, and, specifically, the extent to which different kinds of contextual information could support such a task. In particular, we consider (i) an internal context consisting of the content of functions and source code files in which the identifiers are located, and (ii) an external context involving external documentation. We conducted a family of two experiments with 63 participants, including bachelor, master, Ph.D. students, and post-docs. We randomly sampled a set of 50 identifiers from a corpus of open source C programs and we asked participants to split and expand them with the availability (or not) of internal and external contexts. We report evidence on the usefulness of contextual information for identifier splitting and acronym/abbreviation expansion. We observe that the source code files are more helpful than just looking at function source code, and that the application-level contextual information does not help any further. The availability of external sources of information only helps in some circumstances. Also, in some cases, we observe that participants better expanded acronyms than abbreviations, although in most cases both exhibit the same level of accuracy. Finally, results indicated that the knowledge of English plays a significant effect in identifier splitting/expansion. The obtained results confirm the conjecture that contextual information is useful in program comprehension, including when developers split and expand identifiers to understand them. We hypothesize that the integration of identifier splitting and expansion tools with IDE could help to improve developers' productivity.","bibtex":"@ARTICLE{Guerrouj13-EMSE-TIDIER,\r\n AUTHOR = {Latifa Guerrouj and Di Penta, Massimiliano and \r\n Yann-Ga�l Gu�h�neuc and Giuliano Antoniol},\r\n JOURNAL = {Empirical Software Engineering (EMSE)},\r\n TITLE = {An Experimental Investigation on the Effects of Context \r\n on Source Code Identifiers Splitting and Expansion},\r\n YEAR = {2013},\r\n MONTH = {July},\r\n NOTE = {45 pages.},\r\n NUMBER = {6},\r\n PAGES = {1--48},\r\n VOLUME = {19},\r\n EDITOR = {Victor R. Basili and Lionel C. Briand},\r\n KEYWORDS = {Topic: <b>Identifier analysis</b>, Venue: <b>EMSE</b>},\r\n PUBLISHER = {Springer},\r\n URL = {http://www.ptidej.net/publications/documents/EMSE13.doc.pdf},\r\n ABSTRACT = {Recent and past studies indicate that source code \r\n lexicon plays an important role in program comprehension. Developers \r\n often compose source code identifiers with abbreviated words and \r\n acronyms, and do not always use consistent mechanisms and explicit \r\n separators when creating identifiers. Such choices and \r\n inconsistencies impede the work of developers that must understand \r\n identifiers by decomposing them into their component terms, and \r\n mapping them onto dictionary, application or domain words. When \r\n software documentation is scarce, outdated or simply not available, \r\n developers must therefore use the available contextual information to \r\n understand the source code. This paper aims at investigating how \r\n developers split and expand source code identifiers, and, \r\n specifically, the extent to which different kinds of contextual \r\n information could support such a task. In particular, we consider (i) \r\n an internal context consisting of the content of functions and source \r\n code files in which the identifiers are located, and (ii) an external \r\n context involving external documentation. We conducted a family of \r\n two experiments with 63 participants, including bachelor, master, \r\n Ph.D. students, and post-docs. We randomly sampled a set of 50 \r\n identifiers from a corpus of open source C programs and we asked \r\n participants to split and expand them with the availability (or not) \r\n of internal and external contexts. We report evidence on the \r\n usefulness of contextual information for identifier splitting and \r\n acronym/abbreviation expansion. We observe that the source code files \r\n are more helpful than just looking at function source code, and that \r\n the application-level contextual information does not help any \r\n further. The availability of external sources of information only \r\n helps in some circumstances. Also, in some cases, we observe that \r\n participants better expanded acronyms than abbreviations, although in \r\n most cases both exhibit the same level of accuracy. Finally, results \r\n indicated that the knowledge of English plays a significant effect in \r\n identifier splitting/expansion. The obtained results confirm the \r\n conjecture that contextual information is useful in program \r\n comprehension, including when developers split and expand identifiers \r\n to understand them. We hypothesize that the integration of identifier \r\n splitting and expansion tools with IDE could help to improve \r\n developers' productivity.}\r\n}\r\n\r\n","author_short":["Guerrouj, L.","Di Penta, M.","Gu�h�neuc, Y.","Antoniol, G."],"editor_short":["Basili, V. R.","Briand, L. C."],"key":"Guerrouj13-EMSE-TIDIER","id":"Guerrouj13-EMSE-TIDIER","bibbaseid":"guerrouj-dipenta-guhneuc-antoniol-anexperimentalinvestigationontheeffectsofcontextonsourcecodeidentifierssplittingandexpansion-2013","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/EMSE13.doc.pdf"},"keyword":["Topic: <b>Identifier analysis</b>","Venue: <b>EMSE</b>"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"https://bibbase.org/show?bib=http://www.yann-gael.gueheneuc.net/Work/BibBase/guehene%20(automatically%20cleaned).bib"}},"downloads":0},"search_terms":["experimental","investigation","effects","context","source","code","identifiers","splitting","expansion","guerrouj","di penta","gu�h�neuc","antoniol"],"keywords":["topic: <b>identifier analysis</b>","venue: <b>emse</b>"],"authorIDs":["2tFXMaTSHJKEB5ebi","2wY5eBcsYmbPNfmMS","36dm7jaw5EK5Wrr4D","3NxaNKic3nkXi568L","3S5Dkpx7DNefzJrnf","3afmfmoPr4SHa8B5F","3wmHB7JoQbQz2ujun","4YBWWbao6RKgiyGJE","4jZj9tB4SJ8zEEgHk","5CvA2hsaib2bPMaef","5TFJbxqRDGFj2P8Rg","5a5fb236a39f2c3645000032","5a8f17e006df23bc34000020","5cx79LBmaWcihgM4J","5de9a6425b51bcde01000042","5dee1197584fb4df010000fc","5df228a41e4fe9df0100012c","5df617f72b34d0de0100008b","5dfa14782e791dde010000ea","5dfe3d5e68d95dde01000080","5e02525b6ffa15df0100009f","5e0662c07da1d1de0100021a","5e093e8b934cacdf0100008b","5e0a61673eccf6e001000016","5e0b75b7e73cd6de010000f9","5e0d4ca6ae5827df0100007f","5e0ddf08552b25df01000137","5e0e5c41ac7d11df010000a3","5e1268e7a4cabfdf0100002c","5e12c45a70e2c4f201000043","5e157809f1f31adf01000006","5e162ca1df1bb4de01000123","5e185cff809b84f201000091","5e1a6c39b16ec5df0100000f","5e21b27e96aea7de01000084","5e22c57e49e2b4df0100000f","5e23c2aeb93b51de01000030","5e245835079bb2df0100007d","5e24fa3e2e79a1f201000027","5e26252f408641df01000161","5e26bfbd8535cedf0100005c","5e280fd1f860fcde0100006a","5e2a827f881468de01000080","5e2eb321b84405df01000128","5e2ef635e374eede0100001a","5e2fd6a74e91a9df01000010","5e3266bb5633c9de01000068","5e32ab0ee17accde0100012a","5e32bdec466076df010000d9","5e32d603150c84df01000068","5e34fb145978bef2010000a6","5e36bc8e7b975dde0100009a","5e389940030bcadf010001b4","5e39dd9a3687dddf010000a4","5e3ad173f2a00cdf01000206","5e3dcd50d51253de0100003d","5e3e8713666d79df010000a6","5e3ed80986a596de010000b9","5e3fefe1add5fbde01000087","5e409c79d668c6de010000c7","5e41795ed9f47bee01000194","5e41cd5be7c67ade010000eb","5e42ef1ca6f4a6f2010001eb","5e46dcb342fb31df01000113","5e46f12c461d04f201000078","5e478c9e27a0c8de010000ef","5e47fb06385298df010000b2","5e4add1941072bdf01000011","5e4c1c792dc400de0100011a","5e4c6262271596df010001b9","5e4f0360338acfde01000156","5e4f11b0e5389bde0100007e","5e530b976d68b8df010000a5","5e54ad6d929495df0100007c","5e57161b429006de0100005a","5e57839fcef9b7de0100003c","5e580f5a6a456fde0100004f","5e5afa78038583de010000f7","5e5b477174a3e7df010000b7","5e5d370173eb2edf01000038","5e5fca336b32b0f20100011b","5e60e7f0839e59df010000e8","5e6377cfae1c4dde0100011e","5e657007de41b9df0100017a","5e676f0910be53de0100001a","5gPbX6aQJFjpv2Na9","6eE2yRdMDQr2WGXuA","6iHE5tuM7yTfLd2pA","7BPWyvMr5e6bzbk7T","7RFwhpGkpZRsLwnmB","7amRA4ALcR2mksheF","7mkQL8eiftj5bGMzB","8jPjKehCMsj7ncvxN","8peLXfWtCSic5n7oz","95eRgTcabnJwF46f3","9Ba9JxkjQBCeGBZKg","9DjgvzQrx27uxbyJj","9HD56d3k5yrB9H9oq","9RtPuXNyeS3k8LM9J","9diLYpd8cMmjBh54T","9nx6Yv3XREwJDyRms","AfJhKcg96muyPdu7S","BGvchZsjW7Wejj9Cz","BYwdHpGr6xT5vmE5C","Bah6LM7GXdXTy8GGA","BmH2ytt7sXwPHcrse","CqJYxtqe6qBbtd5yz","D4kEZ2JcWCoMvRPy7","DFWW7D6Y7X57n4cbM","DSorPqHDfrFiNM5Ew","DWXisKXaQArvre3QL","DwBm6isMpKSHHkhAd","E88raoktD8ANF92Yu","EAjLox7ycbofcCXce","F8rzFhY9yWA7pBX4j","G3iynDKjz9BHJbrdg","GJw6mQETXADSCZuuk","GWK5669HLqPyYMQ5J","GibAXjj4xXdFT8qWh","HzFZpgGcfabjAp9x6","KJ4eYziy6hanF9kr9","Kcyu7uncEFiYzYP2D","N4zzhqcywSzDDYsdh","NCDg3xE2mPcNAu7LX","NvgbTAz3hZ9SevZvd","QbcDS3wK43sRASvgu","S3b7Bb9wwfpByQgbo","SXJaeFCgBDJ5HAHtj","T5nL8TGrggoLAF8Dj","W9vT8YcCNFEcp9mWQ","WZ5CpBEFNsb2ivfah","XxviSwRxhwgNwsraH","Z2Zs662GpXqKBEAMc","ZKYFgjHGm7PE4Y2kv","a5qpGirN3B5BLKdMh","ahGA65oGDChNYp7Mb","bA7pGCMS9AB2RBo2p","bTQb3TcrbBShtqFPS","cYnqisf4wzBsM7MF5","cjHpaYiWD5eX7btH4","ckrbesqi3pWqfF2nP","dH8EsWHZtCFuQk5bq","dS5kvBMnk3LMQe56w","eXsFRMzE7WfbHbBL4","fmmsBu4m6ayKtuopf","hdXr3PD8cHNWyAdCe","hgZxckC87u2A57teF","juvCjffHJaPQf44im","keQBT2Apb9yaev8AH","myHdF8zARwW5uGmFs","nJLfaznnYgFqWQQrv","onghitNWSvN2FpCaN","osgPwDW2y5KDXRa2i","pAWFMDHu5dNixqPAq","pLvmgrCjMeDYJiJxB","q4azvWakEjp2TQM7S","qBee6Md9YwRKwkeW3","qQky2Csek4mroLn2P","tJz4YBCqAzZAzek5d","tLtjttw8dEqF6YQ4s","uQ6jCrPijzAmZyfXz","vGEaFNt7mm92Z7GXc","vRkMmE65HSFpCk6FW","vsEsf8FR3Fxb6z7fJ","x5ejzvDeXCc89Dukv","xEQyC5shxpYySSJJm","xhwDdvQ7MYxa6keXm","xkviMnkrGBneANvMr","y64rFMcyp7tDsBrJQ","yBYJWSShoKkMG8aPE","yQPghCwQv22kf6dFq","yd5sCxaEiu5vWizTq"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}