An Empirical Study on the Importance of Source Code Entities for Requirements Traceability. Ali, N., Sharafi, Z., Gu�h�neuc, Y., & Antoniol, G. Empirical Software Engineering (EMSE), 20(2):442–478, Springer, April, 2015. 37 pages.
Paper abstract bibtex Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers' eye movements while they verify RT links. We analyse the obtained data to identify and rank developers' preferred types of Source Code Entities (SCEs), \eg domain vs.\ implementation-level source code terms and class names vs.\ method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers' preferred types of SCEs and not their locations that attract developers' attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, ıe Developers Preferred Term Frequency/Inverse Document Frequency ($DPTF/IDF$), that uses the knowledge of the developers' preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate this weighting scheme with an IR technique, ıe Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency ($TF/IDF$) weighting scheme. Finally, we compare the newly proposed $DPTF/IDF$ with our original Domain Or Implementation/Inverse Document Frequency ($DOI/IDF$) weighting scheme.
@ARTICLE{Ali14-EMSE-EyeTrackingTraceability,
AUTHOR = {Nasir Ali and Zohreh Sharafi and Yann-Ga�l Gu�h�neuc and
Giuliano Antoniol},
JOURNAL = {Empirical Software Engineering (EMSE)},
TITLE = {An Empirical Study on the Importance of Source Code
Entities for Requirements Traceability},
YEAR = {2015},
MONTH = {April},
NOTE = {37 pages.},
NUMBER = {2},
PAGES = {442--478},
VOLUME = {20},
EDITOR = {Victor R. Basili and Lionel C. Briand},
KEYWORDS = {Topic: <b>Requirements and features</b>,
Topic: <b>Program comprehension</b>, Venue: <b>EMSE</b>},
PUBLISHER = {Springer},
URL = {http://www.ptidej.net/publications/documents/EMSE14a.doc.pdf},
ABSTRACT = {Requirements Traceability (RT) links help developers
during program comprehension and maintenance tasks. However, creating
RT links is a laborious and resource-consuming task. Information
Retrieval (IR) techniques are useful to automatically create
traceability links. However, IR-based techniques typically have low
accuracy (precision, recall, or both) and thus, creating RT links
remains a human intensive process. We conjecture that understanding
how developers verify RT links could help improve the accuracy of
IR-based RT techniques to create RT links. Consequently, we perform
an empirical study consisting of four case studies. First, we use an
eye-tracking system to capture developers' eye movements while they
verify RT links. We analyse the obtained data to identify and rank
developers' preferred types of Source Code Entities (SCEs), \eg{}
domain vs.\ implementation-level source code terms and class names
vs.\ method names. Second, we perform another eye-tracking case study
to confirm that it is the semantic content of the developers'
preferred types of SCEs and not their locations that attract
developers' attention and help them in their task to verify RT links.
Third, we propose an improved term weighting scheme, \ie{} Developers
Preferred Term Frequency/Inverse Document Frequency ($DPTF/IDF$),
that uses the knowledge of the developers' preferred types of SCEs to
give more importance to these SCEs into the term weighting scheme. We
integrate this weighting scheme with an IR technique, \ie{} Latent
Semantic Indexing (LSI), to create a new technique to RT link
recovery. Using three systems (iTrust, Lucene, and Pooka), we show
that the proposed technique statistically improves the accuracy of
the recovered RT links over a technique based on LSI and the usual
Term Frequency/Inverse Document Frequency ($TF/IDF$) weighting
scheme. Finally, we compare the newly proposed $DPTF/IDF$ with our
original Domain Or Implementation/Inverse Document Frequency
($DOI/IDF$) weighting scheme.}
}
Downloads: 0
{"_id":"4xdG2QSqanaBpy3LG","bibbaseid":"ali-sharafi-guhneuc-antoniol-anempiricalstudyontheimportanceofsourcecodeentitiesforrequirementstraceability-2015","downloads":0,"creationDate":"2018-01-17T20:29:42.209Z","title":"An Empirical Study on the Importance of Source Code Entities for Requirements Traceability","author_short":["Ali, N.","Sharafi, Z.","Gu�h�neuc, Y.","Antoniol, G."],"year":2015,"bibtype":"article","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["Nasir"],"propositions":[],"lastnames":["Ali"],"suffixes":[]},{"firstnames":["Zohreh"],"propositions":[],"lastnames":["Sharafi"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]}],"journal":"Empirical Software Engineering (EMSE)","title":"An Empirical Study on the Importance of Source Code Entities for Requirements Traceability","year":"2015","month":"April","note":"37 pages.","number":"2","pages":"442–478","volume":"20","editor":[{"firstnames":["Victor","R."],"propositions":[],"lastnames":["Basili"],"suffixes":[]},{"firstnames":["Lionel","C."],"propositions":[],"lastnames":["Briand"],"suffixes":[]}],"keywords":"Topic: <b>Requirements and features</b>, Topic: <b>Program comprehension</b>, Venue: <b>EMSE</b>","publisher":"Springer","url":"http://www.ptidej.net/publications/documents/EMSE14a.doc.pdf","abstract":"Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers' eye movements while they verify RT links. We analyse the obtained data to identify and rank developers' preferred types of Source Code Entities (SCEs), \\eg domain vs.\\ implementation-level source code terms and class names vs.\\ method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers' preferred types of SCEs and not their locations that attract developers' attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, ıe Developers Preferred Term Frequency/Inverse Document Frequency ($DPTF/IDF$), that uses the knowledge of the developers' preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate this weighting scheme with an IR technique, ıe Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency ($TF/IDF$) weighting scheme. Finally, we compare the newly proposed $DPTF/IDF$ with our original Domain Or Implementation/Inverse Document Frequency ($DOI/IDF$) weighting scheme.","bibtex":"@ARTICLE{Ali14-EMSE-EyeTrackingTraceability,\r\n AUTHOR = {Nasir Ali and Zohreh Sharafi and Yann-Ga�l Gu�h�neuc and \r\n Giuliano Antoniol},\r\n JOURNAL = {Empirical Software Engineering (EMSE)},\r\n TITLE = {An Empirical Study on the Importance of Source Code \r\n Entities for Requirements Traceability},\r\n YEAR = {2015},\r\n MONTH = {April},\r\n NOTE = {37 pages.},\r\n NUMBER = {2},\r\n PAGES = {442--478},\r\n VOLUME = {20},\r\n EDITOR = {Victor R. Basili and Lionel C. Briand},\r\n KEYWORDS = {Topic: <b>Requirements and features</b>, \r\n Topic: <b>Program comprehension</b>, Venue: <b>EMSE</b>},\r\n PUBLISHER = {Springer},\r\n URL = {http://www.ptidej.net/publications/documents/EMSE14a.doc.pdf},\r\n ABSTRACT = {Requirements Traceability (RT) links help developers \r\n during program comprehension and maintenance tasks. However, creating \r\n RT links is a laborious and resource-consuming task. Information \r\n Retrieval (IR) techniques are useful to automatically create \r\n traceability links. However, IR-based techniques typically have low \r\n accuracy (precision, recall, or both) and thus, creating RT links \r\n remains a human intensive process. We conjecture that understanding \r\n how developers verify RT links could help improve the accuracy of \r\n IR-based RT techniques to create RT links. Consequently, we perform \r\n an empirical study consisting of four case studies. First, we use an \r\n eye-tracking system to capture developers' eye movements while they \r\n verify RT links. We analyse the obtained data to identify and rank \r\n developers' preferred types of Source Code Entities (SCEs), \\eg{} \r\n domain vs.\\ implementation-level source code terms and class names \r\n vs.\\ method names. Second, we perform another eye-tracking case study \r\n to confirm that it is the semantic content of the developers' \r\n preferred types of SCEs and not their locations that attract \r\n developers' attention and help them in their task to verify RT links. \r\n Third, we propose an improved term weighting scheme, \\ie{} Developers \r\n Preferred Term Frequency/Inverse Document Frequency ($DPTF/IDF$), \r\n that uses the knowledge of the developers' preferred types of SCEs to \r\n give more importance to these SCEs into the term weighting scheme. We \r\n integrate this weighting scheme with an IR technique, \\ie{} Latent \r\n Semantic Indexing (LSI), to create a new technique to RT link \r\n recovery. Using three systems (iTrust, Lucene, and Pooka), we show \r\n that the proposed technique statistically improves the accuracy of \r\n the recovered RT links over a technique based on LSI and the usual \r\n Term Frequency/Inverse Document Frequency ($TF/IDF$) weighting \r\n scheme. Finally, we compare the newly proposed $DPTF/IDF$ with our \r\n original Domain Or Implementation/Inverse Document Frequency \r\n ($DOI/IDF$) weighting scheme.}\r\n}\r\n\r\n","author_short":["Ali, N.","Sharafi, Z.","Gu�h�neuc, Y.","Antoniol, G."],"editor_short":["Basili, V. R.","Briand, L. C."],"key":"Ali14-EMSE-EyeTrackingTraceability","id":"Ali14-EMSE-EyeTrackingTraceability","bibbaseid":"ali-sharafi-guhneuc-antoniol-anempiricalstudyontheimportanceofsourcecodeentitiesforrequirementstraceability-2015","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/EMSE14a.doc.pdf"},"keyword":["Topic: <b>Requirements and features</b>","Topic: <b>Program comprehension</b>","Venue: <b>EMSE</b>"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"https://bibbase.org/show?bib=http://www.yann-gael.gueheneuc.net/Work/BibBase/guehene%20(automatically%20cleaned).bib"}},"downloads":0},"search_terms":["empirical","study","importance","source","code","entities","requirements","traceability","ali","sharafi","gu�h�neuc","antoniol"],"keywords":["topic: <b>requirements and features</b>","topic: <b>program comprehension</b>","venue: <b>emse</b>"],"authorIDs":["AfJhKcg96muyPdu7S","xkviMnkrGBneANvMr"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}