Representativeness in Corpus Design. Biber, D. Literary and Linguistic Computing, 8(4):243–257, October, 1993.
Representativeness in Corpus Design [link]Paper  doi  abstract   bibtex   
The present paper addresses a number of issues related to achieving ‘representativeness’ in linguistic corpus design, including: discussion of what it means to `represent’ a language, definition of the target population, stratified versus proportional sampling of a language, sampling within texts, and issues relating to the required sample size (number of texts) of a corpus. The paper distinguishes among various ways that linguistic features can be distributed within and across texts; it analyzes the distributions of several particular features, and it discusses the implications of these distributions for corpus design.
@article{biber_representativeness_1993,
	title = {Representativeness in {Corpus} {Design}},
	volume = {8},
	issn = {0268-1145, 1477-4615},
	url = {https://academic.oup.com/dsh/article-lookup/doi/10.1093/llc/8.4.243},
	doi = {10.1093/llc/8.4.243},
	abstract = {The present paper addresses a number of issues related to achieving ‘representativeness’ in linguistic corpus design, including: discussion of what it means to `represent’ a language, definition of the target population, stratified versus proportional sampling of a language, sampling within texts, and issues relating to the required sample size (number of texts) of a corpus. The paper distinguishes among various ways that linguistic features can be distributed within and across texts; it analyzes the distributions of several particular features, and it discusses the implications of these distributions for corpus design.},
	language = {en},
	number = {4},
	urldate = {2024-09-04},
	journal = {Literary and Linguistic Computing},
	author = {Biber, D.},
	month = oct,
	year = {1993},
	pages = {243--257},
}

Downloads: 0