Reducing Computational Complexity by Restricting the Size of Compared Web Contents. Uzun, E., Yerlikaya, T., & Kurt, M. In Techsys, 26-28 May, Plovdiv, Bulgaria, pages 157-160, 2011.
Reducing Computational Complexity by Restricting the Size of Compared Web Contents [pdf]Website  abstract   bibtex   1 download  
Extracting the relevant contents on web pages is an important issue for researches on information retrieval, data mining and natural language processing. In this issue, contents of tags in same domain web pages can be used to discover unnecessary contents. However, little changes in tag contents of web pages can cause problems in extraction. Therefore, we have adapted levenshtein distance algorithm to overcome these problems. Nevertheless, tag contents that may contain too many characters, have a negative impact on computational complexity. Hence, a solution, which reduces this complexity by comparing only a few characters, is proposed. In experiments, this solution gives a significant improvement (with 84.37%) in the performance of the use of levenshtein distance algorithm to find irrelevant contents.

Downloads: 1