Web-Scale Information Extraction in KnowItAll (Preliminary Results). Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D. S., & Yates, A. In pages 100-110.
abstract   bibtex   
Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner.The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KnowItAll's architecture and reports on lessons learned for the design of large-scale information extraction systems.
@inproceedings{ etz04a,
  crossref = {www2004},
  author = {Oren Etzioni and Michael Cafarella and Doug Downey and Stanley Kok and Ana-Maria Popescu and Tal Shaked and Stephen Soderland and Daniel S. Weld and Alexander Yates},
  title = {Web-Scale Information Extraction in KnowItAll (Preliminary Results)},
  pages = {100-110},
  topic = {knowitall[1]},
  uri = {http://www2004.org/proceedings/docs/1p100.pdf},
  uri = {http://www.cs.washington.edu/homes/etzioni/papers/www04.pdf},
  abstract = {Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner.The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KnowItAll's architecture and reports on lessons learned for the design of large-scale information extraction systems.}
}

Downloads: 0