Analyzing of the Evolution of Web Pages by Using a Domain Based Web Crawler. Uzun, E., Yerlikaya, T., & Kurt, M. In Techsys, 26-28 May, Plovdiv, Bulgaria, pages 151-156, 2011.
Analyzing of the Evolution of Web Pages by Using a Domain Based Web Crawler [pdf]Website  abstract   bibtex   5 downloads  
To improve algorithms that are used in search engines, crawlers and indexers, the evolution of web pages should be examined. For this purpose, we developed a domain based crawler, namely SET Crawler, which collects the web archives between 1998 and 2008 of three Turkish daily popular newspapers (Hurriyet, Milliyet and Sabah). After completion of the crawl, we obtained a set of 3430997 HTML pages. While the average file size of one web page in 1998 approximately is 5.19 KB, this size in 2008 is 53.94 KB. When considering the size of main contents of web pages are similar, this observation shows the degree of increase in the use of unnecessary contents and tags. Analyses indicate that the use of link, image and layout tags has increased significantly in the last decades. Moreover, the tag has been used instead of the

Downloads: 5