A small set of stylometric features differentiates Latin prose and verse. Chaudhuri, P., Dasgupta, T., Dexter, J. P, & Iyer, K. Digital Scholarship in the Humanities, 34(4):716–729, December, 2019.
A small set of stylometric features differentiates Latin prose and verse [link]Paper  doi  abstract   bibtex   
Identifying the stylistic signatures characteristic of different genres is of central importance to literary theory and criticism. In this article we report a large-scale computational analysis of Latin prose and verse using a combination of quantitative stylistics and supervised machine learning. We train a set of classifiers to differentiate prose and poetry with high accuracy (>97%) based on a set of twenty-six text-based, primarily syntactic features and rank the relative importance of these features to identify a low-dimensional set still sufficient to achieve excellent classifier performance. This analysis demonstrates that Latin prose and verse can be classified effectively using just three top features. From examination of the highly ranked features, we observe that measures of the hypotactic style favored in Latin prose (i.e. subordinating constructions in complex sentences, such as relative clauses) are especially useful for classification.
@article{chaudhuri_small_2019,
	title = {A small set of stylometric features differentiates {Latin} prose and verse},
	volume = {34},
	issn = {2055-7671},
	url = {https://doi.org/10.1093/llc/fqy070},
	doi = {10.1093/llc/fqy070},
	abstract = {Identifying the stylistic signatures characteristic of different genres is of central importance to literary theory and criticism. In this article we report a large-scale computational analysis of Latin prose and verse using a combination of quantitative stylistics and supervised machine learning. We train a set of classifiers to differentiate prose and poetry with high accuracy (\>97\%) based on a set of twenty-six text-based, primarily syntactic features and rank the relative importance of these features to identify a low-dimensional set still sufficient to achieve excellent classifier performance. This analysis demonstrates that Latin prose and verse can be classified effectively using just three top features. From examination of the highly ranked features, we observe that measures of the hypotactic style favored in Latin prose (i.e. subordinating constructions in complex sentences, such as relative clauses) are especially useful for classification.},
	number = {4},
	urldate = {2023-08-26},
	journal = {Digital Scholarship in the Humanities},
	author = {Chaudhuri, Pramit and Dasgupta, Tathagata and Dexter, Joseph P and Iyer, Krithika},
	month = dec,
	year = {2019},
	pages = {716--729},
}

Downloads: 0