Evaluation, ACM Press, 2005. Paper Website abstract bibtex
Finding information about people on the Web using a search engine is difficult because there is a many-to-many mapping between person names and specific persons (i.e. referents). This paper describes a person resolution system, called WebHawk. Given a list of pages obtained by submitting a person query to a search engine, WebHawk facilitates person search in three steps: First of all, a filter removes those pages that contain no information about any person. Secondly, a cluster groups the remaining pages into different clusters, each for one specific person. To make the resulting clusters more meaningful, an extractor is used to induce query-oriented personal information from each page. Finally, a namer generates an informative description for each cluster so that users can find any specific person easily. The architecture of WebHawk is presented, and the four components are discussed in detail, with a separate evaluation of each component presented where appropriate. A user study shows that WebHawk complements most existing search engines and successfully improves users' experience of person search on the Web.