Web-Scale Information Extraction in KnowItAll ( Preliminary Results ). Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D., S., & Yates, A. In Proceedings of the 13th international conference on World Wide Web, of WWW '04, pages 100-110, 2004. ACM Press.
Web-Scale Information Extraction in KnowItAll ( Preliminary Results ) [pdf]Paper  Web-Scale Information Extraction in KnowItAll ( Preliminary Results ) [link]Website  abstract   bibtex   
Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially rel- evant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This pa- per introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner. The paper describes preliminary experiments in which an in- stance of KNOWITALL, running for four days on a single machine, was able to automatically extract 54,753 facts. KNOWITALL asso- ciates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KNOWITALLs architecture and re- ports on lessons learned for the design of large-scale information extraction systems.

Downloads: 0