Evaluation of Hap, AngleSharp and HtmlDocument in web content extraction. Uzun, E., Buluş, H., N., Doruk, A., & Özhan, E. In International Scientific Conference’2017 (UNITECH’17), volume 2, pages 275-278, 2017.
Evaluation of Hap, AngleSharp and HtmlDocument in web content extraction [pdf]Website  abstract   bibtex   1 download  
With the DOM, programming languages can access and change all the HTML elements of a web page. There are several libraries for instantiating the DOM. In this study, we compare three different well-known .NET libraries, including HAP (Html Agility Pack), AngleSharp and MS_HtmlDocument to extract content from web pages. The experimental results indicate that AngleSharp achieves the best results with average 5.54 ms for preprocessing of the DOM and average 0.46 ms for extracting of a content from the DOM.

Downloads: 1