The Enron Corpus: A New Dataset for Email Classification Research. Klimt, B. and Yang, Y. In Boulicaut, J.F.; Esposito, F.; Giannotti, F.; and Pedreschi, D., editors, Machine Learning: ECML 2004, of Lecture Notes in Computer Science, pages 217--226. Springer Berlin Heidelberg, January, 2004.
The Enron Corpus: A New Dataset for Email Classification Research [link]Paper  abstract   bibtex   
Automated classification of email messages into user-specific folders and information extraction from chronologically ordered email streams have become interesting areas in text learning research. However, the lack of large benchmark collections has been an obstacle for studying the problems and evaluating the solutions. In this paper, we introduce the Enron corpus as a new test bed. We analyze its suitability with respect to email folder prediction, and provide the baseline results of a state-of-the-art classifier (Support Vector Machines) under various conditions, including the cases of using individual sections (From, To, Subject and body) alone as the input to the classifier, and using all the sections in combination with regression weights.
@incollection{ klimt_enron_2004,
  series = {Lecture {Notes} in {Computer} {Science}},
  title = {The {Enron} {Corpus}: {A} {New} {Dataset} for {Email} {Classification} {Research}},
  copyright = {©2004 Springer-Verlag Berlin Heidelberg},
  isbn = {978-3-540-23105-9, 978-3-540-30115-8},
  shorttitle = {The {Enron} {Corpus}},
  url = {http://link.springer.com/chapter/10.1007/978-3-540-30115-8_22},
  abstract = {Automated classification of email messages into user-specific folders and information extraction from chronologically ordered email streams have become interesting areas in text learning research. However, the lack of large benchmark collections has been an obstacle for studying the problems and evaluating the solutions. In this paper, we introduce the Enron corpus as a new test bed. We analyze its suitability with respect to email folder prediction, and provide the baseline results of a state-of-the-art classifier (Support Vector Machines) under various conditions, including the cases of using individual sections (From, To, Subject and body) alone as the input to the classifier, and using all the sections in combination with regression weights.},
  language = {en},
  number = {3201},
  urldate = {2014-09-01TZ},
  booktitle = {Machine {Learning}: {ECML} 2004},
  publisher = {Springer Berlin Heidelberg},
  author = {Klimt, Bryan and Yang, Yiming},
  editor = {Boulicaut, Jean-François and Esposito, Floriana and Giannotti, Fosca and Pedreschi, Dino},
  month = {January},
  year = {2004},
  pages = {217--226}
}
Downloads: 0