Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness

Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness. Arnaoudova, V., Eshkevari, L. M., Oliveto, R., Gu�h�neuc, Y., & Antoniol, G. In Ferenc, R. & Poshyvanyk, D., editors, Proceedings of the 26th International Conference on Software Maintenance (ICSM), pages 1–5, September, 2010. IEEE CS Press. Early Research Achievements Track. Best paper. 5 pages.

Paper abstract bibtex

Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. Entropy measures the \emphphysical dispersion of terms in a program: the higher the entropy, the more scattered across the program the terms. Context coverage measures the \emphconceptual dispersion of terms: the higher their context coverage, the more unrelated the methods using them. We compute term entropy and context coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context coverage are more fault-prone than others.

@INPROCEEDINGS{Arnaoudova10-ICSM-ERA-IdentifierEntropy,
   AUTHOR       = {Venera Arnaoudova and Laleh Mousavi Eshkevari and 
      Rocco Oliveto and Yann-Ga�l Gu�h�neuc and Giuliano Antoniol},
   BOOKTITLE    = {Proceedings of the 26<sup>th</sup> International Conference on Software Maintenance (ICSM)},
   TITLE        = {Physical and Conceptual Identifier Dispersion: Measures 
      and Relation to Fault Proneness},
   YEAR         = {2010},
   OPTADDRESS   = {},
   OPTCROSSREF  = {},
   EDITOR       = {Rudolf Ferenc and Denys Poshyvanyk},
   MONTH        = {September},
   NOTE         = {Early Research Achievements Track. Best paper. 5 pages.},
   OPTNUMBER    = {},
   OPTORGANIZATION = {},
   PAGES        = {1--5},
   PUBLISHER    = {IEEE CS Press},
   OPTSERIES    = {},
   OPTVOLUME    = {},
   KEYWORDS     = {Topic: <b>Identifier analysis</b>, Venue: <c>ICSM</c>, 
      Venue: <c>ICSME</c>},
   URL          = {http://www.ptidej.net/publications/documents/ICSM10a.doc.pdf},
   PDF          = {http://www.ptidej.net/publications/documents/ICSM10a.ppt.pdf},
   ABSTRACT     = {Poorly-chosen identifiers have been reported in the 
      literature as misleading and increasing the program comprehension 
      effort. Identifiers are composed of terms, which can be dictionary 
      words, acronyms, contractions, or simple strings. We conjecture that 
      the use of identical terms in different contexts may increase the 
      risk of faults. We investigate our conjecture using a measure 
      combining term entropy and term context coverage to study whether 
      certain terms increase the odds ratios of methods to be fault-prone. 
      Entropy measures the \emph{physical dispersion} of terms in a 
      program: the higher the entropy, the more scattered across the 
      program the terms. Context coverage measures the \emph{conceptual 
      dispersion} of terms: the higher their context coverage, the more 
      unrelated the methods using them. We compute term entropy and context 
      coverage of terms extracted from identifiers in Rhino 1.4R3 and 
      ArgoUML 0.16. We show statistically that methods containing terms 
      with high entropy and context coverage are more fault-prone than 
      others.}
}

Downloads: 0

{"_id":"nALiMxKEbBcGk2wCE","bibbaseid":"arnaoudova-eshkevari-oliveto-guhneuc-antoniol-physicalandconceptualidentifierdispersionmeasuresandrelationtofaultproneness-2010","downloads":0,"creationDate":"2018-01-17T20:29:42.459Z","title":"Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness","author_short":["Arnaoudova, V.","Eshkevari, L. M.","Oliveto, R.","Gu�h�neuc, Y.","Antoniol, G."],"year":2010,"bibtype":"inproceedings","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Venera"],"propositions":[],"lastnames":["Arnaoudova"],"suffixes":[]},{"firstnames":["Laleh","Mousavi"],"propositions":[],"lastnames":["Eshkevari"],"suffixes":[]},{"firstnames":["Rocco"],"propositions":[],"lastnames":["Oliveto"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]}],"booktitle":"Proceedings of the 26th International Conference on Software Maintenance (ICSM)","title":"Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness","year":"2010","optaddress":"","optcrossref":"","editor":[{"firstnames":["Rudolf"],"propositions":[],"lastnames":["Ferenc"],"suffixes":[]},{"firstnames":["Denys"],"propositions":[],"lastnames":["Poshyvanyk"],"suffixes":[]}],"month":"September","note":"Early Research Achievements Track. Best paper. 5 pages.","optnumber":"","optorganization":"","pages":"1–5","publisher":"IEEE CS Press","optseries":"","optvolume":"","keywords":"Topic: Identifier analysis, Venue: <c>ICSM</c>, Venue: <c>ICSME</c>","url":"http://www.ptidej.net/publications/documents/ICSM10a.doc.pdf","pdf":"http://www.ptidej.net/publications/documents/ICSM10a.ppt.pdf","abstract":"Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. Entropy measures the \\emphphysical dispersion of terms in a program: the higher the entropy, the more scattered across the program the terms. Context coverage measures the \\emphconceptual dispersion of terms: the higher their context coverage, the more unrelated the methods using them. We compute term entropy and context coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context coverage are more fault-prone than others.","bibtex":"@INPROCEEDINGS{Arnaoudova10-ICSM-ERA-IdentifierEntropy,\r\n AUTHOR = {Venera Arnaoudova and Laleh Mousavi Eshkevari and \r\n Rocco Oliveto and Yann-Ga�l Gu�h�neuc and Giuliano Antoniol},\r\n BOOKTITLE = {Proceedings of the 26th International Conference on Software Maintenance (ICSM)},\r\n TITLE = {Physical and Conceptual Identifier Dispersion: Measures \r\n and Relation to Fault Proneness},\r\n YEAR = {2010},\r\n OPTADDRESS = {},\r\n OPTCROSSREF = {},\r\n EDITOR = {Rudolf Ferenc and Denys Poshyvanyk},\r\n MONTH = {September},\r\n NOTE = {Early Research Achievements Track. Best paper. 5 pages.},\r\n OPTNUMBER = {},\r\n OPTORGANIZATION = {},\r\n PAGES = {1--5},\r\n PUBLISHER = {IEEE CS Press},\r\n OPTSERIES = {},\r\n OPTVOLUME = {},\r\n KEYWORDS = {Topic: Identifier analysis, Venue: <c>ICSM</c>, \r\n Venue: <c>ICSME</c>},\r\n URL = {http://www.ptidej.net/publications/documents/ICSM10a.doc.pdf},\r\n PDF = {http://www.ptidej.net/publications/documents/ICSM10a.ppt.pdf},\r\n ABSTRACT = {Poorly-chosen identifiers have been reported in the \r\n literature as misleading and increasing the program comprehension \r\n effort. Identifiers are composed of terms, which can be dictionary \r\n words, acronyms, contractions, or simple strings. We conjecture that \r\n the use of identical terms in different contexts may increase the \r\n risk of faults. We investigate our conjecture using a measure \r\n combining term entropy and term context coverage to study whether \r\n certain terms increase the odds ratios of methods to be fault-prone. \r\n Entropy measures the \\emph{physical dispersion} of terms in a \r\n program: the higher the entropy, the more scattered across the \r\n program the terms. Context coverage measures the \\emph{conceptual \r\n dispersion} of terms: the higher their context coverage, the more \r\n unrelated the methods using them. We compute term entropy and context \r\n coverage of terms extracted from identifiers in Rhino 1.4R3 and \r\n ArgoUML 0.16. We show statistically that methods containing terms \r\n with high entropy and context coverage are more fault-prone than \r\n others.}\r\n}\r\n\r\n","author_short":["Arnaoudova, V.","Eshkevari, L. M.","Oliveto, R.","Gu�h�neuc, Y.","Antoniol, G."],"editor_short":["Ferenc, R.","Poshyvanyk, D."],"key":"Arnaoudova10-ICSM-ERA-IdentifierEntropy","id":"Arnaoudova10-ICSM-ERA-IdentifierEntropy","bibbaseid":"arnaoudova-eshkevari-oliveto-guhneuc-antoniol-physicalandconceptualidentifierdispersionmeasuresandrelationtofaultproneness-2010","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/ICSM10a.doc.pdf"},"keyword":["Topic: Identifier analysis","Venue: <c>ICSM</c>","Venue: <c>ICSME</c>"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"http://www.yann-gael.gueheneuc.net/"}}},"search_terms":["physical","conceptual","identifier","dispersion","measures","relation","fault","proneness","arnaoudova","eshkevari","oliveto","gu�h�neuc","antoniol"],"keywords":["topic: identifier analysis","venue: <c>icsm</c>","venue: <c>icsme</c>"],"authorIDs":["2tFXMaTSHJKEB5ebi","2wY5eBcsYmbPNfmMS","36dm7jaw5EK5Wrr4D","3NxaNKic3nkXi568L","3S5Dkpx7DNefzJrnf","3afmfmoPr4SHa8B5F","3wmHB7JoQbQz2ujun","4YBWWbao6RKgiyGJE","4jZj9tB4SJ8zEEgHk","5CvA2hsaib2bPMaef","5TFJbxqRDGFj2P8Rg","5a5fb236a39f2c3645000032","5a8f17e006df23bc34000020","5cx79LBmaWcihgM4J","5de9a6425b51bcde01000042","5dee1197584fb4df010000fc","5df228a41e4fe9df0100012c","5df617f72b34d0de0100008b","5dfa14782e791dde010000ea","5dfe3d5e68d95dde01000080","5e02525b6ffa15df0100009f","5e0662c07da1d1de0100021a","5e093e8b934cacdf0100008b","5e0a61673eccf6e001000016","5e0b75b7e73cd6de010000f9","5e0d4ca6ae5827df0100007f","5e0ddf08552b25df01000137","5e0e5c41ac7d11df010000a3","5e1268e7a4cabfdf0100002c","5e12c45a70e2c4f201000043","5e157809f1f31adf01000006","5e162ca1df1bb4de01000123","5e185cff809b84f201000091","5e1a6c39b16ec5df0100000f","5e21b27e96aea7de01000084","5e22c57e49e2b4df0100000f","5e23c2aeb93b51de01000030","5e245835079bb2df0100007d","5e24fa3e2e79a1f201000027","5e26252f408641df01000161","5e26bfbd8535cedf0100005c","5e280fd1f860fcde0100006a","5e2a827f881468de01000080","5e2eb321b84405df01000128","5e2ef635e374eede0100001a","5e2fd6a74e91a9df01000010","5e3266bb5633c9de01000068","5e32ab0ee17accde0100012a","5e32bdec466076df010000d9","5e32d603150c84df01000068","5e34fb145978bef2010000a6","5e36bc8e7b975dde0100009a","5e389940030bcadf010001b4","5e39dd9a3687dddf010000a4","5e3ad173f2a00cdf01000206","5e3dcd50d51253de0100003d","5e3e8713666d79df010000a6","5e3ed80986a596de010000b9","5e3fefe1add5fbde01000087","5e409c79d668c6de010000c7","5e41795ed9f47bee01000194","5e41cd5be7c67ade010000eb","5e42ef1ca6f4a6f2010001eb","5e46dcb342fb31df01000113","5e46f12c461d04f201000078","5e478c9e27a0c8de010000ef","5e47fb06385298df010000b2","5e4add1941072bdf01000011","5e4c1c792dc400de0100011a","5e4c6262271596df010001b9","5e4f0360338acfde01000156","5e4f11b0e5389bde0100007e","5e530b976d68b8df010000a5","5e54ad6d929495df0100007c","5e57161b429006de0100005a","5e57839fcef9b7de0100003c","5e580f5a6a456fde0100004f","5e5afa78038583de010000f7","5e5b477174a3e7df010000b7","5e5d370173eb2edf01000038","5e5fca336b32b0f20100011b","5e60e7f0839e59df010000e8","5e6377cfae1c4dde0100011e","5e657007de41b9df0100017a","5e676f0910be53de0100001a","5gPbX6aQJFjpv2Na9","6eE2yRdMDQr2WGXuA","6iHE5tuM7yTfLd2pA","7BPWyvMr5e6bzbk7T","7RFwhpGkpZRsLwnmB","7amRA4ALcR2mksheF","7mkQL8eiftj5bGMzB","8jPjKehCMsj7ncvxN","8peLXfWtCSic5n7oz","95eRgTcabnJwF46f3","9Ba9JxkjQBCeGBZKg","9DjgvzQrx27uxbyJj","9HD56d3k5yrB9H9oq","9RtPuXNyeS3k8LM9J","9diLYpd8cMmjBh54T","9nx6Yv3XREwJDyRms","AfJhKcg96muyPdu7S","BGvchZsjW7Wejj9Cz","BYwdHpGr6xT5vmE5C","Bah6LM7GXdXTy8GGA","BmH2ytt7sXwPHcrse","CqJYxtqe6qBbtd5yz","D4kEZ2JcWCoMvRPy7","DFWW7D6Y7X57n4cbM","DSorPqHDfrFiNM5Ew","DWXisKXaQArvre3QL","DwBm6isMpKSHHkhAd","E88raoktD8ANF92Yu","EAjLox7ycbofcCXce","F8rzFhY9yWA7pBX4j","G3iynDKjz9BHJbrdg","GJw6mQETXADSCZuuk","GWK5669HLqPyYMQ5J","GibAXjj4xXdFT8qWh","HzFZpgGcfabjAp9x6","KJ4eYziy6hanF9kr9","Kcyu7uncEFiYzYP2D","N4zzhqcywSzDDYsdh","NCDg3xE2mPcNAu7LX","NvgbTAz3hZ9SevZvd","QbcDS3wK43sRASvgu","S3b7Bb9wwfpByQgbo","SXJaeFCgBDJ5HAHtj","T5nL8TGrggoLAF8Dj","W9vT8YcCNFEcp9mWQ","WZ5CpBEFNsb2ivfah","XxviSwRxhwgNwsraH","Z2Zs662GpXqKBEAMc","ZKYFgjHGm7PE4Y2kv","a5qpGirN3B5BLKdMh","ahGA65oGDChNYp7Mb","bA7pGCMS9AB2RBo2p","bTQb3TcrbBShtqFPS","cYnqisf4wzBsM7MF5","cjHpaYiWD5eX7btH4","ckrbesqi3pWqfF2nP","dH8EsWHZtCFuQk5bq","dS5kvBMnk3LMQe56w","eXsFRMzE7WfbHbBL4","fmmsBu4m6ayKtuopf","hdXr3PD8cHNWyAdCe","hgZxckC87u2A57teF","juvCjffHJaPQf44im","keQBT2Apb9yaev8AH","myHdF8zARwW5uGmFs","nJLfaznnYgFqWQQrv","onghitNWSvN2FpCaN","osgPwDW2y5KDXRa2i","pAWFMDHu5dNixqPAq","pLvmgrCjMeDYJiJxB","q4azvWakEjp2TQM7S","qBee6Md9YwRKwkeW3","qQky2Csek4mroLn2P","tJz4YBCqAzZAzek5d","tLtjttw8dEqF6YQ4s","uQ6jCrPijzAmZyfXz","vGEaFNt7mm92Z7GXc","vRkMmE65HSFpCk6FW","vsEsf8FR3Fxb6z7fJ","x5ejzvDeXCc89Dukv","xEQyC5shxpYySSJJm","xhwDdvQ7MYxa6keXm","xkviMnkrGBneANvMr","y64rFMcyp7tDsBrJQ","yBYJWSShoKkMG8aPE","yQPghCwQv22kf6dFq","yd5sCxaEiu5vWizTq"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}