An Empirical Study on the Importance of Source Code Entities for Requirements Traceability. Ali, N., Sharafi, Z., Gu�h�neuc, Y., & Antoniol, G. Empirical Software Engineering (EMSE), 20(2):442–478, Springer, April, 2015. 37 pages.
An Empirical Study on the Importance of Source Code Entities for Requirements Traceability [pdf]Paper  abstract   bibtex   
Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers' eye movements while they verify RT links. We analyse the obtained data to identify and rank developers' preferred types of Source Code Entities (SCEs), \eg domain vs.\ implementation-level source code terms and class names vs.\ method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers' preferred types of SCEs and not their locations that attract developers' attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, ıe Developers Preferred Term Frequency/Inverse Document Frequency ($DPTF/IDF$), that uses the knowledge of the developers' preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate this weighting scheme with an IR technique, ıe Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency ($TF/IDF$) weighting scheme. Finally, we compare the newly proposed $DPTF/IDF$ with our original Domain Or Implementation/Inverse Document Frequency ($DOI/IDF$) weighting scheme.

Downloads: 0