Inherent Characteristics of Traceability Artifacts: Less Is More. Hayes, J. H., Antoniol, G., Adams, B., & Gu�h�neuc, Y. In Zowghi, D. & Gervasi, V., editors, Proceedings of the 23<sup>rd</sup> International Requirements Engineering Conference (RE), pages 196–201, August, 2015. IEEE CS Press.  6 pages. RE Next!![pdf Inherent Characteristics of Traceability Artifacts: Less Is More [pdf]](https://bibbase.org/img/filetypes/pdf.svg) Paper  abstract   bibtex
Paper  abstract   bibtex   This paper describes ongoing work to characterize the inherent ease or ``traceability'' with which a textual artifact can be traced using an automated technique. Software traceability approaches use varied measures to build models that automatically recover links between pairs of natural language documents. Thus far, most of the approaches use a single-step model, such as logistic regression, to identify new trace links. However, such approaches require a large enough training set of both true and false trace links. Yet, the former are by far in the minority, which reduces the performance of such models. Therefore, this paper formulates the problem of identifying trace links as the problem of finding, for a given logistic regression model, the subsets of links in the training set giving the best accuracy (in terms of G-metric) on a test set. Using hill climbing with random restart for subset selection, we found that, for the ChangeStyle dataset, we can classify links with a precision of up to 40\NOand a recall of up to 66\NOusing a training set as small as one true candidate link (out of 33) and 41 false links. To get better performance and learn the best possible logistic regression classifier, we must ``discard'' links in the trace dataset that increase noise to avoid learning with links that are not representative. This preliminary work is promising because it shows that few correct examples may perform better than several poor ones. It also shows which inherent characteristics of the artifacts make them good candidates to learn efficient traceability models automatically, i.e., it reveals their traceability.
@INPROCEEDINGS{Hayes15-RENext-InherentCharacteristics,
   AUTHOR       = {Jane Huffman Hayes and Giuliano Antoniol and Bram Adams and 
      Yann-Ga�l Gu�h�neuc},
   BOOKTITLE    = {Proceedings of the 23<sup>rd</sup> International Requirements Engineering Conference (RE)},
   TITLE        = {Inherent Characteristics of Traceability Artifacts: Less 
      Is More},
   YEAR         = {2015},
   OPTADDRESS   = {},
   OPTCROSSREF  = {},
   EDITOR       = {Didar Zowghi and Vincenzo Gervasi},
   MONTH        = {August},
   NOTE         = {6 pages. RE Next!},
   OPTNUMBER    = {},
   OPTORGANIZATION = {},
   PAGES        = {196--201},
   PUBLISHER    = {IEEE CS Press},
   OPTSERIES    = {},
   OPTVOLUME    = {},
   KEYWORDS     = {Topic: <b>Program comprehension</b>, Venue: <c>RE</c>},
   URL          = {http://www.ptidej.net/publications/documents/RENext15.doc.pdf},
   PDF          = {http://www.ptidej.net/publications/documents/RENext15.ppt.pdf},
   ABSTRACT     = {This paper describes ongoing work to characterize the 
      inherent ease or ``traceability'' with which a textual artifact can 
      be traced using an automated technique. Software traceability 
      approaches use varied measures to build models that automatically 
      recover links between pairs of natural language documents. Thus far, 
      most of the approaches use a single-step model, such as logistic 
      regression, to identify new trace links. However, such approaches 
      require a large enough training set of both true and false trace 
      links. Yet, the former are by far in the minority, which reduces the 
      performance of such models. Therefore, this paper formulates the 
      problem of identifying trace links as the problem of finding, for a 
      given logistic regression model, the subsets of links in the training 
      set giving the best accuracy (in terms of G-metric) on a test set. 
      Using hill climbing with random restart for subset selection, we 
      found that, for the ChangeStyle dataset, we can classify links with a 
      precision of up to 40\NOand a recall of up to 66\NOusing a training 
      set as small as one true candidate link (out of 33) and 41 false 
      links. To get better performance and learn the best possible logistic 
      regression classifier, we must ``discard'' links in the trace dataset 
      that increase noise to avoid learning with links that are not 
      representative. This preliminary work is promising because it shows 
      that few correct examples may perform better than several poor ones. 
      It also shows which inherent characteristics of the artifacts make 
      them good candidates to learn efficient traceability models 
      automatically, i.e., it reveals their traceability.}
} 
Downloads: 0
{"_id":"B5tfLXTdbzCmwA4gE","bibbaseid":"hayes-antoniol-adams-guhneuc-inherentcharacteristicsoftraceabilityartifactslessismore-2015","downloads":0,"creationDate":"2018-01-17T20:29:42.224Z","title":"Inherent Characteristics of Traceability Artifacts: Less Is More","author_short":["Hayes, J. H.","Antoniol, G.","Adams, B.","Gu�h�neuc, Y."],"year":2015,"bibtype":"inproceedings","biburl":"http://www.yann-gael.gueheneuc.net/Work/Publications/Biblio/complete-bibliography.bib","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Jane","Huffman"],"propositions":[],"lastnames":["Hayes"],"suffixes":[]},{"firstnames":["Giuliano"],"propositions":[],"lastnames":["Antoniol"],"suffixes":[]},{"firstnames":["Bram"],"propositions":[],"lastnames":["Adams"],"suffixes":[]},{"firstnames":["Yann-Ga�l"],"propositions":[],"lastnames":["Gu�h�neuc"],"suffixes":[]}],"booktitle":"Proceedings of the 23<sup>rd</sup> International Requirements Engineering Conference (RE)","title":"Inherent Characteristics of Traceability Artifacts: Less Is More","year":"2015","optaddress":"","optcrossref":"","editor":[{"firstnames":["Didar"],"propositions":[],"lastnames":["Zowghi"],"suffixes":[]},{"firstnames":["Vincenzo"],"propositions":[],"lastnames":["Gervasi"],"suffixes":[]}],"month":"August","note":"6 pages. RE Next!","optnumber":"","optorganization":"","pages":"196–201","publisher":"IEEE CS Press","optseries":"","optvolume":"","keywords":"Topic: <b>Program comprehension</b>, Venue: <c>RE</c>","url":"http://www.ptidej.net/publications/documents/RENext15.doc.pdf","pdf":"http://www.ptidej.net/publications/documents/RENext15.ppt.pdf","abstract":"This paper describes ongoing work to characterize the inherent ease or ``traceability'' with which a textual artifact can be traced using an automated technique. Software traceability approaches use varied measures to build models that automatically recover links between pairs of natural language documents. Thus far, most of the approaches use a single-step model, such as logistic regression, to identify new trace links. However, such approaches require a large enough training set of both true and false trace links. Yet, the former are by far in the minority, which reduces the performance of such models. Therefore, this paper formulates the problem of identifying trace links as the problem of finding, for a given logistic regression model, the subsets of links in the training set giving the best accuracy (in terms of G-metric) on a test set. Using hill climbing with random restart for subset selection, we found that, for the ChangeStyle dataset, we can classify links with a precision of up to 40\\NOand a recall of up to 66\\NOusing a training set as small as one true candidate link (out of 33) and 41 false links. To get better performance and learn the best possible logistic regression classifier, we must ``discard'' links in the trace dataset that increase noise to avoid learning with links that are not representative. This preliminary work is promising because it shows that few correct examples may perform better than several poor ones. It also shows which inherent characteristics of the artifacts make them good candidates to learn efficient traceability models automatically, i.e., it reveals their traceability.","bibtex":"@INPROCEEDINGS{Hayes15-RENext-InherentCharacteristics,\r\n   AUTHOR       = {Jane Huffman Hayes and Giuliano Antoniol and Bram Adams and \r\n      Yann-Ga�l Gu�h�neuc},\r\n   BOOKTITLE    = {Proceedings of the 23<sup>rd</sup> International Requirements Engineering Conference (RE)},\r\n   TITLE        = {Inherent Characteristics of Traceability Artifacts: Less \r\n      Is More},\r\n   YEAR         = {2015},\r\n   OPTADDRESS   = {},\r\n   OPTCROSSREF  = {},\r\n   EDITOR       = {Didar Zowghi and Vincenzo Gervasi},\r\n   MONTH        = {August},\r\n   NOTE         = {6 pages. RE Next!},\r\n   OPTNUMBER    = {},\r\n   OPTORGANIZATION = {},\r\n   PAGES        = {196--201},\r\n   PUBLISHER    = {IEEE CS Press},\r\n   OPTSERIES    = {},\r\n   OPTVOLUME    = {},\r\n   KEYWORDS     = {Topic: <b>Program comprehension</b>, Venue: <c>RE</c>},\r\n   URL          = {http://www.ptidej.net/publications/documents/RENext15.doc.pdf},\r\n   PDF          = {http://www.ptidej.net/publications/documents/RENext15.ppt.pdf},\r\n   ABSTRACT     = {This paper describes ongoing work to characterize the \r\n      inherent ease or ``traceability'' with which a textual artifact can \r\n      be traced using an automated technique. Software traceability \r\n      approaches use varied measures to build models that automatically \r\n      recover links between pairs of natural language documents. Thus far, \r\n      most of the approaches use a single-step model, such as logistic \r\n      regression, to identify new trace links. However, such approaches \r\n      require a large enough training set of both true and false trace \r\n      links. Yet, the former are by far in the minority, which reduces the \r\n      performance of such models. Therefore, this paper formulates the \r\n      problem of identifying trace links as the problem of finding, for a \r\n      given logistic regression model, the subsets of links in the training \r\n      set giving the best accuracy (in terms of G-metric) on a test set. \r\n      Using hill climbing with random restart for subset selection, we \r\n      found that, for the ChangeStyle dataset, we can classify links with a \r\n      precision of up to 40\\NOand a recall of up to 66\\NOusing a training \r\n      set as small as one true candidate link (out of 33) and 41 false \r\n      links. To get better performance and learn the best possible logistic \r\n      regression classifier, we must ``discard'' links in the trace dataset \r\n      that increase noise to avoid learning with links that are not \r\n      representative. This preliminary work is promising because it shows \r\n      that few correct examples may perform better than several poor ones. \r\n      It also shows which inherent characteristics of the artifacts make \r\n      them good candidates to learn efficient traceability models \r\n      automatically, i.e., it reveals their traceability.}\r\n}\r\n\r\n","author_short":["Hayes, J. H.","Antoniol, G.","Adams, B.","Gu�h�neuc, Y."],"editor_short":["Zowghi, D.","Gervasi, V."],"key":"Hayes15-RENext-InherentCharacteristics","id":"Hayes15-RENext-InherentCharacteristics","bibbaseid":"hayes-antoniol-adams-guhneuc-inherentcharacteristicsoftraceabilityartifactslessismore-2015","role":"author","urls":{"Paper":"http://www.ptidej.net/publications/documents/RENext15.doc.pdf"},"keyword":["Topic: <b>Program comprehension</b>","Venue: <c>RE</c>"],"metadata":{"authorlinks":{"gu�h�neuc, y":"https://bibbase.org/show?bib=http%3A%2F%2Fwww.yann-gael.gueheneuc.net%2FWork%2FPublications%2FBiblio%2Fcomplete-bibliography.bib&msg=embed","guéhéneuc, y":"https://bibbase.org/show?bib=http://www.yann-gael.gueheneuc.net/Work/BibBase/guehene%20(automatically%20cleaned).bib"}},"downloads":0},"search_terms":["inherent","characteristics","traceability","artifacts","less","more","hayes","antoniol","adams","gu�h�neuc"],"keywords":["topic: <b>program comprehension</b>","venue: <c>re</c>"],"authorIDs":["AfJhKcg96muyPdu7S","xkviMnkrGBneANvMr"],"dataSources":["Sed98LbBeGaXxenrM","8vn5MSGYWB4fAx9Z4"]}