'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship

'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship. Burrows, J. Journal of Library Metadata, 21(3-4):63–103, October, 2021. 🏷️ /unread、*****、X-CHECK、t_Stylometry、obj_Methods

Paper doi abstract bibtex

This paper is a companion to my ‘Questions of authorship: attribution and beyond’, in which I sketched a new way of using the relative frequencies of the very common words for comparing written texts and testing their likely authorship. The main emphasis of that paper was not on the new procedure but on the broader consequences of our increasing sophistication in making such comparisons and the increasing (although never absolute) reliability of our inferences about authorship. My present objects, accordingly, are to give a more complete account of the procedure itself; to report the outcome of an extensive set of trials; and to consider the strengths and limitations of the new procedure. The procedure offers a simple but comparatively accurate addition to our current methods of distinguishing the most likely author of texts exceeding about 1,500 words in length. It is of even greater value as a method of reducing the field of likely candidates for texts of as little as 100 words in length. Not unexpectedly, it works least well with texts of a genre uncharacteristic of their author and, in one case, with texts far separated in time across a long literary career. Its possible use for other classificatory tasks has not yet been investigated. 【摘要翻译】本文是我的《作者身份问题：归属及其他》一文的后续，在该文中，我勾画了一种使用非常常见词语的相对频率来比较书面文本并检验其可能的作者身份的新方法。该论文的主要重点不在于新的程序，而在于我们在进行此类比较时日益成熟所带来的更广泛的后果，以及我们对作者身份的推断日益增加（尽管从来都不是绝对的）可靠性。因此，我现在的目标是更全面地介绍该程序本身；报告一系列广泛试验的结果；以及考虑新程序的优势和局限性。该程序为我们目前区分长度超过 1500 字的文本的最可能作者的方法提供了一个简单但相对准确的补充。对于长度仅为 100 字的文本，它作为一种减少可能的候选作者范围的方法，具有更大的价值。不足为奇的是，这种方法在体裁与作者不符的文本中效果最差，在一个案例中，这种方法在跨越漫长文学生涯的时间上相距甚远的文本中效果最差。目前尚未研究该方法在其他分类任务中的应用可能性。

@article{burrows2021a,
	title = {'{Delta}': a {Measure} of {Stylistic} {Difference} and a {Guide} to {Likely} {Authorship}},
	volume = {21},
	shorttitle = {德尔塔"：文体差异的衡量标准和作者可能性的指南},
	url = {http://llc.oxfordjournals.org/content/17/3/267.abstract},
	doi = {10.1093/llc/17.3.267},
	abstract = {This paper is a companion to my ‘Questions of authorship: attribution and beyond’, in which I sketched a new way of using the relative frequencies of the very common words for comparing written texts and testing their likely authorship. The main emphasis of that paper was not on the new procedure but on the broader consequences of our increasing sophistication in making such comparisons and the increasing (although never absolute) reliability of our inferences about authorship. My present objects, accordingly, are to give a more complete account of the procedure itself; to report the outcome of an extensive set of trials; and to consider the strengths and limitations of the new procedure. The procedure offers a simple but comparatively accurate addition to our current methods of distinguishing the most likely author of texts exceeding about 1,500 words in length. It is of even greater value as a method of reducing the field of likely candidates for texts of as little as 100 words in length. Not unexpectedly, it works least well with texts of a genre uncharacteristic of their author and, in one case, with texts far separated in time across a long literary career. Its possible use for other classificatory tasks has not yet been investigated.

【摘要翻译】本文是我的《作者身份问题：归属及其他》一文的后续，在该文中，我勾画了一种使用非常常见词语的相对频率来比较书面文本并检验其可能的作者身份的新方法。该论文的主要重点不在于新的程序，而在于我们在进行此类比较时日益成熟所带来的更广泛的后果，以及我们对作者身份的推断日益增加（尽管从来都不是绝对的）可靠性。因此，我现在的目标是更全面地介绍该程序本身；报告一系列广泛试验的结果；以及考虑新程序的优势和局限性。该程序为我们目前区分长度超过 1500 字的文本的最可能作者的方法提供了一个简单但相对准确的补充。对于长度仅为 100 字的文本，它作为一种减少可能的候选作者范围的方法，具有更大的价值。不足为奇的是，这种方法在体裁与作者不符的文本中效果最差，在一个案例中，这种方法在跨越漫长文学生涯的时间上相距甚远的文本中效果最差。目前尚未研究该方法在其他分类任务中的应用可能性。},
	language = {en},
	number = {3-4},
	urldate = {2011-07-26},
	journal = {Journal of Library Metadata},
	author = {Burrows, John},
	month = oct,
	year = {2021},
	note = {🏷️ /unread、*****、X-CHECK、t\_Stylometry、obj\_Methods},
	keywords = {*****, /unread, X-CHECK, obj\_Methods, t\_Stylometry},
	pages = {63--103},
}

Downloads: 0

{"_id":"7FoioWrKYip5mGRyu","bibbaseid":"burrows-deltaameasureofstylisticdifferenceandaguidetolikelyauthorship-2021","author_short":["Burrows, J."],"bibdata":{"bibtype":"article","type":"article","title":"'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship","volume":"21","shorttitle":"德尔塔\"：文体差异的衡量标准和作者可能性的指南","url":"http://llc.oxfordjournals.org/content/17/3/267.abstract","doi":"10.1093/llc/17.3.267","abstract":"This paper is a companion to my ‘Questions of authorship: attribution and beyond’, in which I sketched a new way of using the relative frequencies of the very common words for comparing written texts and testing their likely authorship. The main emphasis of that paper was not on the new procedure but on the broader consequences of our increasing sophistication in making such comparisons and the increasing (although never absolute) reliability of our inferences about authorship. My present objects, accordingly, are to give a more complete account of the procedure itself; to report the outcome of an extensive set of trials; and to consider the strengths and limitations of the new procedure. The procedure offers a simple but comparatively accurate addition to our current methods of distinguishing the most likely author of texts exceeding about 1,500 words in length. It is of even greater value as a method of reducing the field of likely candidates for texts of as little as 100 words in length. Not unexpectedly, it works least well with texts of a genre uncharacteristic of their author and, in one case, with texts far separated in time across a long literary career. Its possible use for other classificatory tasks has not yet been investigated. 【摘要翻译】本文是我的《作者身份问题：归属及其他》一文的后续，在该文中，我勾画了一种使用非常常见词语的相对频率来比较书面文本并检验其可能的作者身份的新方法。该论文的主要重点不在于新的程序，而在于我们在进行此类比较时日益成熟所带来的更广泛的后果，以及我们对作者身份的推断日益增加（尽管从来都不是绝对的）可靠性。因此，我现在的目标是更全面地介绍该程序本身；报告一系列广泛试验的结果；以及考虑新程序的优势和局限性。该程序为我们目前区分长度超过 1500 字的文本的最可能作者的方法提供了一个简单但相对准确的补充。对于长度仅为 100 字的文本，它作为一种减少可能的候选作者范围的方法，具有更大的价值。不足为奇的是，这种方法在体裁与作者不符的文本中效果最差，在一个案例中，这种方法在跨越漫长文学生涯的时间上相距甚远的文本中效果最差。目前尚未研究该方法在其他分类任务中的应用可能性。","language":"en","number":"3-4","urldate":"2011-07-26","journal":"Journal of Library Metadata","author":[{"propositions":[],"lastnames":["Burrows"],"firstnames":["John"],"suffixes":[]}],"month":"October","year":"2021","note":"🏷️ /unread、*****、X-CHECK、t_Stylometry、obj_Methods","keywords":"*****, /unread, X-CHECK, obj_Methods, t_Stylometry","pages":"63–103","bibtex":"@article{burrows2021a,\n\ttitle = {'{Delta}': a {Measure} of {Stylistic} {Difference} and a {Guide} to {Likely} {Authorship}},\n\tvolume = {21},\n\tshorttitle = {德尔塔\"：文体差异的衡量标准和作者可能性的指南},\n\turl = {http://llc.oxfordjournals.org/content/17/3/267.abstract},\n\tdoi = {10.1093/llc/17.3.267},\n\tabstract = {This paper is a companion to my ‘Questions of authorship: attribution and beyond’, in which I sketched a new way of using the relative frequencies of the very common words for comparing written texts and testing their likely authorship. The main emphasis of that paper was not on the new procedure but on the broader consequences of our increasing sophistication in making such comparisons and the increasing (although never absolute) reliability of our inferences about authorship. My present objects, accordingly, are to give a more complete account of the procedure itself; to report the outcome of an extensive set of trials; and to consider the strengths and limitations of the new procedure. The procedure offers a simple but comparatively accurate addition to our current methods of distinguishing the most likely author of texts exceeding about 1,500 words in length. It is of even greater value as a method of reducing the field of likely candidates for texts of as little as 100 words in length. Not unexpectedly, it works least well with texts of a genre uncharacteristic of their author and, in one case, with texts far separated in time across a long literary career. Its possible use for other classificatory tasks has not yet been investigated.\n\n【摘要翻译】本文是我的《作者身份问题：归属及其他》一文的后续，在该文中，我勾画了一种使用非常常见词语的相对频率来比较书面文本并检验其可能的作者身份的新方法。该论文的主要重点不在于新的程序，而在于我们在进行此类比较时日益成熟所带来的更广泛的后果，以及我们对作者身份的推断日益增加（尽管从来都不是绝对的）可靠性。因此，我现在的目标是更全面地介绍该程序本身；报告一系列广泛试验的结果；以及考虑新程序的优势和局限性。该程序为我们目前区分长度超过 1500 字的文本的最可能作者的方法提供了一个简单但相对准确的补充。对于长度仅为 100 字的文本，它作为一种减少可能的候选作者范围的方法，具有更大的价值。不足为奇的是，这种方法在体裁与作者不符的文本中效果最差，在一个案例中，这种方法在跨越漫长文学生涯的时间上相距甚远的文本中效果最差。目前尚未研究该方法在其他分类任务中的应用可能性。},\n\tlanguage = {en},\n\tnumber = {3-4},\n\turldate = {2011-07-26},\n\tjournal = {Journal of Library Metadata},\n\tauthor = {Burrows, John},\n\tmonth = oct,\n\tyear = {2021},\n\tnote = {🏷️ /unread、*****、X-CHECK、t\\_Stylometry、obj\\_Methods},\n\tkeywords = {*****, /unread, X-CHECK, obj\\_Methods, t\\_Stylometry},\n\tpages = {63--103},\n}\n\n","author_short":["Burrows, J."],"key":"burrows2021a","id":"burrows2021a","bibbaseid":"burrows-deltaameasureofstylisticdifferenceandaguidetolikelyauthorship-2021","role":"author","urls":{"Paper":"http://llc.oxfordjournals.org/content/17/3/267.abstract"},"keyword":["*****","/unread","X-CHECK","obj_Methods","t_Stylometry"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2386895/collections/57E2Z43C/items?format=bibtex&limit=100","dataSources":["ir9qwYi57xSZaZFXh"],"keywords":["*****","/unread","x-check","obj_methods","t_stylometry"],"search_terms":["delta","measure","stylistic","difference","guide","authorship","burrows"],"title":"'Delta': a Measure of Stylistic Difference and a Guide to Likely Authorship","year":2021}