Using Wikipedia to bootstrap open information extraction. Weld, D., S., Hoffmann, R., & Wu, F. ACM SIGMOD Record, 37(4):62, ACM, 2009.
Using Wikipedia to bootstrap open information extraction [pdf]Paper  Using Wikipedia to bootstrap open information extraction [link]Website  abstract   bibtex   
INTRODUCTION: We often use ‘Data Management’ to refer to the manipulation of relational or semi-structured information, but much of the world’s data is unstructured, for example the vast amount of natural-language text on the Web. The ability to manage the information underlying this unstructured text is therefore increasingly important. While information retrieval techniques, as embodied in today’s sophisticated search engines, offer important capabilities, they lack the most important faculties found in relational databases: 1) queries comprising aggregation, sorting and joins, and 2) structured visualization such as faceted browsing [29]. Information extraction (IE), the process of generating structured data from unstructured text, has the potential to convert much of the Web to relational form — enabling these powerful querying and visualization methods. Implemented systems have used manually-generated extractors (e.g., regular expressions) to “screen scrape” for decades, but in recent years machine learning methods have transformed IE, speeding the development of relation-specific extractors and greatly improving precision and recall. While the technology has led to many commercial applications, it requires identifying target relations ahead of time and the laborious construction of a labeled training set. As a result, supervised learning techniques can’t scale to the vast number of relations discussed on the Web.

Downloads: 0