Improving Source Code Quality by Improving Identifier Quality and Avoiding Linguistic Antipatterns. Arnaoudova, V. Ph.D. Thesis, Polytechnique Montr�al, August, 2014. 173 pages.Paper abstract bibtex Program comprehension is a key activity during software development and maintenance. Although frequently performed---even more often than actually writing code---program comprehension is a challenging activity. The difficulty to understand a program increases with its size and complexity and as a result the comprehension of complex programs, in the best-case scenario, more time consuming when compared to simple ones but it can also lead to introducing faults in the program. Hence, structural properties such as size and complexity are often used to identify complex and fault prone programs. However, from early theories studying developers' behavior while understanding a program, we know that the textual information contained in identifiers and comments---\ie the source code lexicon---is part of the factors that affect the psychological complexity of a program, \ie factors that make a program difficult to understand and maintain by humans. In this dissertation we provide evidence that metrics evaluating the quality of source code lexicon are an asset for software fault explanation and prediction. Moreover, the quality of identifiers and comments considered in isolation may not be sufficient to reveal flaws---in his theory about the program understanding process for example, Brooks warns that it may happen that comments and code are contradictory. Consequently, we address the problem of contradictory, and more generally of inconsistent, lexicon by defining a catalog of Linguistic Antipatterns (LAs), \ie poor practices in the choice of identifiers resulting in inconsistencies among the name, implementation, and documentation of a programming entity. Then, we empirically evaluate the relevance of LAs---\ie how important they are---to industrial and open-source developers. Overall, results indicate that the majority of the developers perceives LAs as poor practices and therefore must be avoided. We also distill a subset of canonical LAs that developers found particularly unacceptable or for which they undertook an action. In fact, we discovered that 10% of the examples containing LAs were removed by developers after we pointed them out. Developers' explanations and the large proportion of yet unresolved LAs suggest that there may be other factors that impact the decision of removing LAs, which is often done through renaming. We conduct a survey with developers and show that renaming is not a straightforward activity and that there are several factors preventing developers from renaming. These results suggest that it would be more beneficial to highlight LAs and other lexicon bad smells as developers write source code---\eg using our LAPD Checkstyle plugin detecting LAs---so that the improvement can be done on-the-fly without impacting other program entities.
@PHDTHESIS{Arnaoudova14-PhD,
AUTHOR = {Venera Arnaoudova},
SCHOOL = {Polytechnique Montr�al},
TITLE = {Improving Source Code Quality by Improving Identifier
Quality and Avoiding Linguistic Antipatterns},
YEAR = {2014},
OPTADDRESS = {},
MONTH = {August},
NOTE = {173 pages.},
OPTTYPE = {},
KEYWORDS = {Linguistic smells},
URL = {http://www.ptidej.net/publications/documents/Thesis+of+Venera+Arnaoudova.doc.pdf},
PDF = {http://www.ptidej.net/publications/documents/Thesis+of+Venera+Arnaoudova.ppt.pdf},
ABSTRACT = {Program comprehension is a key activity during software
development and maintenance. Although frequently performed---even
more often than actually writing code---program comprehension is a
challenging activity. The difficulty to understand a program
increases with its size and complexity and as a result the
comprehension of complex programs, in the best-case scenario, more
time consuming when compared to simple ones but it can also lead to
introducing faults in the program. Hence, structural properties such
as size and complexity are often used to identify complex and fault
prone programs. However, from early theories studying developers'
behavior while understanding a program, we know that the textual
information contained in identifiers and comments---\ie the source
code lexicon---is part of the factors that affect the psychological
complexity of a program, \ie factors that make a program difficult to
understand and maintain by humans. In this dissertation we provide
evidence that metrics evaluating the quality of source code lexicon
are an asset for software fault explanation and prediction. Moreover,
the quality of identifiers and comments considered in isolation may
not be sufficient to reveal flaws---in his theory about the program
understanding process for example, Brooks warns that it may happen
that comments and code are contradictory. Consequently, we address
the problem of contradictory, and more generally of inconsistent,
lexicon by defining a catalog of Linguistic Antipatterns (LAs), \ie
poor practices in the choice of identifiers resulting in
inconsistencies among the name, implementation, and documentation of
a programming entity. Then, we empirically evaluate the relevance of
LAs---\ie how important they are---to industrial and open-source
developers. Overall, results indicate that the majority of the
developers perceives LAs as poor practices and therefore must be
avoided. We also distill a subset of \textit{canonical LAs} that
developers found particularly unacceptable or for which they
undertook an action. In fact, we discovered that 10\% of the examples
containing LAs were removed by developers after we pointed them out.
Developers' explanations and the large proportion of yet unresolved
LAs suggest that there may be other factors that impact the decision
of removing LAs, which is often done through renaming. We conduct a
survey with developers and show that renaming is not a
straightforward activity and that there are several factors
preventing developers from renaming. These results suggest that it
would be more beneficial to highlight LAs and other lexicon bad
smells as developers write source code---\eg using our LAPD
Checkstyle plugin detecting LAs---so that the improvement can be done
on-the-fly without impacting other program entities.}
}
Downloads: 0
{"_id":"agygAJe7E5gxfaJmK","bibbaseid":"arnaoudova-improvingsourcecodequalitybyimprovingidentifierqualityandavoidinglinguisticantipatterns-2014","downloads":0,"creationDate":"2018-02-22T19:07:02.704Z","title":"Improving Source Code Quality by Improving Identifier Quality and Avoiding Linguistic Antipatterns","author_short":["Arnaoudova, V."],"year":2014,"bibtype":"phdthesis","biburl":"http://ptidej.polymtl.ca/yann-gael/Work/Publications/Biblio/complete-bibliography.bib?","bibdata":{"bibtype":"phdthesis","type":"phdthesis","author":[{"firstnames":["Venera"],"propositions":[],"lastnames":["Arnaoudova"],"suffixes":[]}],"school":"Polytechnique Montr�al","title":"Improving Source Code Quality by Improving Identifier Quality and Avoiding Linguistic Antipatterns","year":"2014","optaddress":"","month":"August","note":"173 pages.","opttype":"","keywords":"Linguistic smells","url":"http://www.ptidej.net/publications/documents/Thesis+of+Venera+Arnaoudova.doc.pdf","pdf":"http://www.ptidej.net/publications/documents/Thesis+of+Venera+Arnaoudova.ppt.pdf","abstract":"Program comprehension is a key activity during software development and maintenance. Although frequently performed---even more often than actually writing code---program comprehension is a challenging activity. The difficulty to understand a program increases with its size and complexity and as a result the comprehension of complex programs, in the best-case scenario, more time consuming when compared to simple ones but it can also lead to introducing faults in the program. Hence, structural properties such as size and complexity are often used to identify complex and fault prone programs. However, from early theories studying developers' behavior while understanding a program, we know that the textual information contained in identifiers and comments---\\ie the source code lexicon---is part of the factors that affect the psychological complexity of a program, \\ie factors that make a program difficult to understand and maintain by humans. In this dissertation we provide evidence that metrics evaluating the quality of source code lexicon are an asset for software fault explanation and prediction. Moreover, the quality of identifiers and comments considered in isolation may not be sufficient to reveal flaws---in his theory about the program understanding process for example, Brooks warns that it may happen that comments and code are contradictory. Consequently, we address the problem of contradictory, and more generally of inconsistent, lexicon by defining a catalog of Linguistic Antipatterns (LAs), \\ie poor practices in the choice of identifiers resulting in inconsistencies among the name, implementation, and documentation of a programming entity. Then, we empirically evaluate the relevance of LAs---\\ie how important they are---to industrial and open-source developers. Overall, results indicate that the majority of the developers perceives LAs as poor practices and therefore must be avoided. We also distill a subset of <i>canonical LAs</i> that developers found particularly unacceptable or for which they undertook an action. In fact, we discovered that 10% of the examples containing LAs were removed by developers after we pointed them out. Developers' explanations and the large proportion of yet unresolved LAs suggest that there may be other factors that impact the decision of removing LAs, which is often done through renaming. We conduct a survey with developers and show that renaming is not a straightforward activity and that there are several factors preventing developers from renaming. These results suggest that it would be more beneficial to highlight LAs and other lexicon bad smells as developers write source code---\\eg using our LAPD Checkstyle plugin detecting LAs---so that the improvement can be done on-the-fly without impacting other program entities.","bibtex":"@PHDTHESIS{Arnaoudova14-PhD,\r\n AUTHOR = {Venera Arnaoudova},\r\n SCHOOL = {Polytechnique Montr�al},\r\n TITLE = {Improving Source Code Quality by Improving Identifier \r\n Quality and Avoiding Linguistic Antipatterns},\r\n YEAR = {2014},\r\n OPTADDRESS = {},\r\n MONTH = {August},\r\n NOTE = {173 pages.},\r\n OPTTYPE = {},\r\n KEYWORDS = {Linguistic smells},\r\n URL = {http://www.ptidej.net/publications/documents/Thesis+of+Venera+Arnaoudova.doc.pdf},\r\n PDF = {http://www.ptidej.net/publications/documents/Thesis+of+Venera+Arnaoudova.ppt.pdf},\r\n ABSTRACT = {Program comprehension is a key activity during software \r\n development and maintenance. Although frequently performed---even \r\n more often than actually writing code---program comprehension is a \r\n challenging activity. The difficulty to understand a program \r\n increases with its size and complexity and as a result the \r\n comprehension of complex programs, in the best-case scenario, more \r\n time consuming when compared to simple ones but it can also lead to \r\n introducing faults in the program. Hence, structural properties such \r\n as size and complexity are often used to identify complex and fault \r\n prone programs. However, from early theories studying developers' \r\n behavior while understanding a program, we know that the textual \r\n information contained in identifiers and comments---\\ie the source \r\n code lexicon---is part of the factors that affect the psychological \r\n complexity of a program, \\ie factors that make a program difficult to \r\n understand and maintain by humans. In this dissertation we provide \r\n evidence that metrics evaluating the quality of source code lexicon \r\n are an asset for software fault explanation and prediction. Moreover, \r\n the quality of identifiers and comments considered in isolation may \r\n not be sufficient to reveal flaws---in his theory about the program \r\n understanding process for example, Brooks warns that it may happen \r\n that comments and code are contradictory. Consequently, we address \r\n the problem of contradictory, and more generally of inconsistent, \r\n lexicon by defining a catalog of Linguistic Antipatterns (LAs), \\ie \r\n poor practices in the choice of identifiers resulting in \r\n inconsistencies among the name, implementation, and documentation of \r\n a programming entity. Then, we empirically evaluate the relevance of \r\n LAs---\\ie how important they are---to industrial and open-source \r\n developers. Overall, results indicate that the majority of the \r\n developers perceives LAs as poor practices and therefore must be \r\n avoided. We also distill a subset of \\textit{canonical LAs} that \r\n developers found particularly unacceptable or for which they \r\n undertook an action. In fact, we discovered that 10\\% of the examples \r\n containing LAs were removed by developers after we pointed them out. \r\n Developers' explanations and the large proportion of yet unresolved \r\n LAs suggest that there may be other factors that impact the decision \r\n of removing LAs, which is often done through renaming. We conduct a \r\n survey with developers and show that renaming is not a \r\n straightforward activity and that there are several factors \r\n preventing developers from renaming. These results suggest that it \r\n would be more beneficial to highlight LAs and other lexicon bad \r\n smells as developers write source code---\\eg using our LAPD \r\n Checkstyle plugin detecting LAs---so that the improvement can be done \r\n on-the-fly without impacting other program entities.}\r\n}\r\n\r\n","author_short":["Arnaoudova, V."],"key":"Arnaoudova14-PhD","id":"Arnaoudova14-PhD","bibbaseid":"arnaoudova-improvingsourcecodequalitybyimprovingidentifierqualityandavoidinglinguisticantipatterns-2014","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/Thesis+of+Venera+Arnaoudova.doc.pdf"},"keyword":["Linguistic smells"],"downloads":0},"search_terms":["improving","source","code","quality","improving","identifier","quality","avoiding","linguistic","antipatterns","arnaoudova"],"keywords":["linguistic smells"],"authorIDs":[],"dataSources":["ascnA6qYXirdFSqqy"]}