Reducing Computational Complexity by Restricting the Size of Compared Web Contents

Reducing Computational Complexity by Restricting the Size of Compared Web Contents. Uzun, E., Yerlikaya, T., & Kurt, M. In Techsys, 26-28 May, Plovdiv, Bulgaria, pages 157-160, 2011.

Website abstract bibtex 1 download

Extracting the relevant contents on web pages is an important issue for researches on information retrieval, data mining and natural language processing. In this issue, contents of tags in same domain web pages can be used to discover unnecessary contents. However, little changes in tag contents of web pages can cause problems in extraction. Therefore, we have adapted levenshtein distance algorithm to overcome these problems. Nevertheless, tag contents that may contain too many characters, have a negative impact on computational complexity. Hence, a solution, which reduces this complexity by comparing only a few characters, is proposed. In experiments, this solution gives a significant improvement (with 84.37%) in the performance of the use of levenshtein distance algorithm to find irrelevant contents.

@inproceedings{
 title = {Reducing Computational Complexity by Restricting the Size of Compared Web Contents},
 type = {inproceedings},
 year = {2011},
 pages = {157-160},
 websites = {https://erdincuzun.com/wp-content/uploads/download/2011/plovdiv02.pdf},
 id = {f8abacd6-9015-3977-a456-0a19263ead7a},
 created = {2018-06-05T12:53:51.707Z},
 file_attached = {false},
 profile_id = {37fa15c3-e5d0-3212-8e18-e4c72814fd47},
 last_modified = {2020-01-16T20:29:39.228Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Uzun2011},
 private_publication = {false},
 abstract = {Extracting the relevant contents on web pages is an important issue for researches on information retrieval, data mining and natural language processing. In this issue, contents of tags in same domain web pages can be used to discover unnecessary contents. However, little changes in tag contents of web pages can cause problems in extraction. Therefore, we have adapted levenshtein distance algorithm to overcome these problems. Nevertheless, tag contents that may contain too many characters, have a negative impact on computational complexity. Hence, a solution, which reduces this complexity by comparing only a few characters, is proposed. In experiments, this solution gives a significant improvement (with 84.37%) in the performance of the use of levenshtein distance algorithm to find irrelevant contents.},
 bibtype = {inproceedings},
 author = {Uzun, Erdinç and Yerlikaya, Tarık and Kurt, Meltem},
 booktitle = {Techsys, 26-28 May, Plovdiv, Bulgaria},
 keywords = {Levenshtein Distance Algorithm,Parsing HTML,Reducing Complexity}
}

Downloads: 1

{"_id":"ZXcLFbv2idw74tXCy","bibbaseid":"uzun-yerlikaya-kurt-reducingcomputationalcomplexitybyrestrictingthesizeofcomparedwebcontents-2011","downloads":1,"creationDate":"2018-07-03T12:59:41.821Z","title":"Reducing Computational Complexity by Restricting the Size of Compared Web Contents","author_short":["Uzun, E.","Yerlikaya, T.","Kurt, M."],"year":2011,"bibtype":"inproceedings","biburl":"https://bibbase.org/service/mendeley/37fa15c3-e5d0-3212-8e18-e4c72814fd47","bibdata":{"title":"Reducing Computational Complexity by Restricting the Size of Compared Web Contents","type":"inproceedings","year":"2011","pages":"157-160","websites":"https://erdincuzun.com/wp-content/uploads/download/2011/plovdiv02.pdf","id":"f8abacd6-9015-3977-a456-0a19263ead7a","created":"2018-06-05T12:53:51.707Z","file_attached":false,"profile_id":"37fa15c3-e5d0-3212-8e18-e4c72814fd47","last_modified":"2020-01-16T20:29:39.228Z","read":false,"starred":false,"authored":"true","confirmed":"true","hidden":false,"citation_key":"Uzun2011","private_publication":false,"abstract":"Extracting the relevant contents on web pages is an important issue for researches on information retrieval, data mining and natural language processing. In this issue, contents of tags in same domain web pages can be used to discover unnecessary contents. However, little changes in tag contents of web pages can cause problems in extraction. Therefore, we have adapted levenshtein distance algorithm to overcome these problems. Nevertheless, tag contents that may contain too many characters, have a negative impact on computational complexity. Hence, a solution, which reduces this complexity by comparing only a few characters, is proposed. In experiments, this solution gives a significant improvement (with 84.37%) in the performance of the use of levenshtein distance algorithm to find irrelevant contents.","bibtype":"inproceedings","author":"Uzun, Erdinç and Yerlikaya, Tarık and Kurt, Meltem","booktitle":"Techsys, 26-28 May, Plovdiv, Bulgaria","keywords":"Levenshtein Distance Algorithm,Parsing HTML,Reducing Complexity","bibtex":"@inproceedings{\n title = {Reducing Computational Complexity by Restricting the Size of Compared Web Contents},\n type = {inproceedings},\n year = {2011},\n pages = {157-160},\n websites = {https://erdincuzun.com/wp-content/uploads/download/2011/plovdiv02.pdf},\n id = {f8abacd6-9015-3977-a456-0a19263ead7a},\n created = {2018-06-05T12:53:51.707Z},\n file_attached = {false},\n profile_id = {37fa15c3-e5d0-3212-8e18-e4c72814fd47},\n last_modified = {2020-01-16T20:29:39.228Z},\n read = {false},\n starred = {false},\n authored = {true},\n confirmed = {true},\n hidden = {false},\n citation_key = {Uzun2011},\n private_publication = {false},\n abstract = {Extracting the relevant contents on web pages is an important issue for researches on information retrieval, data mining and natural language processing. In this issue, contents of tags in same domain web pages can be used to discover unnecessary contents. However, little changes in tag contents of web pages can cause problems in extraction. Therefore, we have adapted levenshtein distance algorithm to overcome these problems. Nevertheless, tag contents that may contain too many characters, have a negative impact on computational complexity. Hence, a solution, which reduces this complexity by comparing only a few characters, is proposed. In experiments, this solution gives a significant improvement (with 84.37%) in the performance of the use of levenshtein distance algorithm to find irrelevant contents.},\n bibtype = {inproceedings},\n author = {Uzun, Erdinç and Yerlikaya, Tarık and Kurt, Meltem},\n booktitle = {Techsys, 26-28 May, Plovdiv, Bulgaria},\n keywords = {Levenshtein Distance Algorithm,Parsing HTML,Reducing Complexity}\n}","author_short":["Uzun, E.","Yerlikaya, T.","Kurt, M."],"urls":{"Website":"https://erdincuzun.com/wp-content/uploads/download/2011/plovdiv02.pdf"},"biburl":"https://bibbase.org/service/mendeley/37fa15c3-e5d0-3212-8e18-e4c72814fd47","bibbaseid":"uzun-yerlikaya-kurt-reducingcomputationalcomplexitybyrestrictingthesizeofcomparedwebcontents-2011","role":"author","keyword":["Levenshtein Distance Algorithm","Parsing HTML","Reducing Complexity"],"metadata":{"authorlinks":{"uzun, e":"https://erdincuzun.com/yayinlar/"}},"downloads":1},"search_terms":["reducing","computational","complexity","restricting","size","compared","web","contents","uzun","yerlikaya","kurt"],"keywords":["levenshtein distance algorithm","parsing html","reducing complexity"],"authorIDs":["2wDRMTHtFwJ2yuDZe","4HpuNzTDB77RPysgv","5b3b733dee8f8d100000001d","5def39dfe83f7dde01000144","5df088e2e49680f201000182","5df5e294f65dd9de01000092","5df899bb10b1d1de01000065","5e07e59df1089ddf01000068","5e09b776954ff4de0100000e","5e09f9f652efb3de0100005a","5e0bcc1b94c532f301000122","5e0da1f1675bf1de01000090","5e0e0e86e2dbbedf010000b8","5e1481b108c265de01000083","5e19986a204503de01000062","5e20ad2a5c2065de01000011","5e20c57b5c2065de010001b6","5e21918d3ef35cdf0100002e","5e2304977db53bde01000051","5e266a8e581147f201000063","5e37afcde84c4cf2010000b1","5e473c28d8f94bde01000001","5e4afd90332a9bde01000056","5e56894feb2916df010000d3","5e56cbcf96127bde01000165","5e5ba5ba727df9f3010000af","5e653122ee6356df0100010c","5e654df00c7028de01000107","8zoSQXchLttJr6AC6","9964hBDGgNZN9zXyR","BGBeh5qvFPmPWb8Ju","C6ziTe8aozQPiDWrz","E5628mYtwMMRdJ8Nd","Kc3qvWJfa8ZiaHaLP","PSfrdm9aFaHufkcgt","QrE2Jk7Eehmqc5trT","XossDmHTGbfF9szKw","bQz2KvdGSjcF2XmSP","bk2KWwj9vCGBPERe5","c9n3aGJRRXvDRaSwK","goaW2pn4xnDLBEJZS","hGw6unSeNa5C7Rjcy","hvvgcYMQ79bFKzfQP","j5v4EZErzSRAG9BD9","jdFZvd6w2touNnNg9","jhyQmmMwiAwycxTXE","oQn2fTFvxv8J3zyN9","pjp5sbA3RA2TrLhrk","qqsFtnh63wFY9cg68","tLwaqE5BiAEi9Zoe9","ub7PupXJBzYLdxArZ","vmh4XmXHDB9vbGSqW"],"dataSources":["mqdHLrE2gnaRYnL6B","ya2CyA73rpZseyrZ8","2252seNhipfTmjEBQ"]}