Case-Sensitive Letter and Bigram Frequency Counts from Large-Scale English Corpora. Jones, M. N. & Mewhort, D. J. K. 36(3):388–396.
Case-Sensitive Letter and Bigram Frequency Counts from Large-Scale English Corpora [link]Paper  doi  abstract   bibtex   
We tabulated upper- and lowercase letter frequency using several large-scale English corpora (∼183 million words in total). The results indicate that the relative frequencies for upper- and lowercase letters are not equivalent. We report a letter-naming experiment in which uppercase frequency predicted response time to uppercase letters better than did lowercase frequency. Tables of case-sensitive letter and bigram frequency are provided, including common nonalphabetic characters. Because subjects are sensitive to frequency relationships among letters, we recommend that experimenters use case-sensitive counts when constructing stimuli from letters.
@article{jonesCasesensitiveLetterBigram2004,
  title = {Case-Sensitive Letter and Bigram Frequency Counts from Large-Scale {{English}} Corpora},
  author = {Jones, Michael N. and Mewhort, D. J. K.},
  date = {2004},
  journaltitle = {Behavior Research Methods, Instruments, \& Computers},
  shortjournal = {Behavior Research Methods, Instruments, \& Computers},
  volume = {36},
  pages = {388--396},
  issn = {1532-5970},
  doi = {10.3758/BF03195586},
  url = {https://doi.org/10.3758/BF03195586},
  urldate = {2020-05-10},
  abstract = {We tabulated upper- and lowercase letter frequency using several large-scale English corpora (∼183 million words in total). The results indicate that the relative frequencies for upper- and lowercase letters are not equivalent. We report a letter-naming experiment in which uppercase frequency predicted response time to uppercase letters better than did lowercase frequency. Tables of case-sensitive letter and bigram frequency are provided, including common nonalphabetic characters. Because subjects are sensitive to frequency relationships among letters, we recommend that experimenters use case-sensitive counts when constructing stimuli from letters.},
  keywords = {~INRMM-MiD:z-XJL39NVM,frequency,languages,statistics},
  langid = {english},
  number = {3}
}
Downloads: 0