A Large Opinion Corpus in Portuguese - Tackling Out-Of-Vocabulary Words. Hartmann, N., Avanço, L., Balage, P., Duran, M., das Graças Volpe Nunes, M., Pardo, T., & Aluísio, S. In Proceedings of the 9th edition of the Language Resources and Evaluation Conference, pages 3865-3871, Reykjavik, Iceland, 2014.
Mendeley
0 abstract bibtex Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.
@inproceedings{ mendeley_6479626184,
isauthor = {1},
abstract = {Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.},
canonical_id = {41a6656e-4bb3-3b17-8d10-9efd7efa8dee},
added = {1391370271},
year = {2014},
isstarred = {0},
id = {6479626184},
discipline = {Computer and Information Science},
address = {Reykjavik, Iceland},
title = {A Large Opinion Corpus in Portuguese - Tackling Out-Of-Vocabulary Words},
deletionpending = {0},
version = {1411229447},
type = {Conference Proceedings},
url_mendeley = {http://www.mendeley.com/research/large-opinion-corpus-portuguese-tackling-outofvocabulary-words/},
isread = {0},
author = {Nathan {Hartmann} and Lucas {Avanço} and Pedro {Balage} and Magali {Duran} and Maria das Graças Volpe {Nunes} and Thiago {Pardo} and Sandra {Aluísio}},
series = {Proceedings of the 9th edition of the Language Resources and Evaluation Conference},
pages = {3865-3871},
url_0 = {http://www.lrec-conf.org/proceedings/lrec2014/pdf/413_Paper.pdf},
modified = {1411229447},
citation_key = {Hartmann2014},
booktitle = {Proceedings of the 9th edition of the Language Resources and Evaluation Conference},
subdiscipline = {Artificial Intelligence}
}
Downloads: 0
{"_id":{"_str":"541daa5bffe14fcc4f002184"},"__v":0,"authorIDs":[],"author_short":["Hartmann, N.","Avanço, L.","Balage, P.","Duran, M.","das Graças Volpe<nbsp>Nunes, M.","Pardo, T.","Aluísio, S."],"bibbaseid":"hartmann-avano-balage-duran-dasgraasvolpenbspnunes-pardo-alusio-alargeopinioncorpusinportuguesetacklingoutofvocabularywords-2014","bibdata":{"downloads":0,"urls":{" mendeley":"http://www.mendeley.com/research/large-opinion-corpus-portuguese-tackling-outofvocabulary-words/"," 0":"http://www.lrec-conf.org/proceedings/lrec2014/pdf/413_Paper.pdf"},"role":"author","bibbaseid":"hartmann-avano-balage-duran-dasgraasvolpenbspnunes-pardo-alusio-alargeopinioncorpusinportuguesetacklingoutofvocabularywords-2014","year":"2014","version":"1411229447","url_mendeley":"http://www.mendeley.com/research/large-opinion-corpus-portuguese-tackling-outofvocabulary-words/","url_0":"http://www.lrec-conf.org/proceedings/lrec2014/pdf/413_Paper.pdf","type":"Conference Proceedings","title":"A Large Opinion Corpus in Portuguese - Tackling Out-Of-Vocabulary Words","subdiscipline":"Artificial Intelligence","series":"Proceedings of the 9th edition of the Language Resources and Evaluation Conference","pages":"3865-3871","modified":"1411229447","key":"mendeley_6479626184","isstarred":"0","isread":"0","isauthor":"1","id":"mendeley_6479626184","discipline":"Computer and Information Science","deletionpending":"0","citation_key":"Hartmann2014","canonical_id":"41a6656e-4bb3-3b17-8d10-9efd7efa8dee","booktitle":"Proceedings of the 9th edition of the Language Resources and Evaluation Conference","bibtype":"inproceedings","bibtex":"@inproceedings{ mendeley_6479626184,\n isauthor = {1},\n abstract = {Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.},\n canonical_id = {41a6656e-4bb3-3b17-8d10-9efd7efa8dee},\n added = {1391370271},\n year = {2014},\n isstarred = {0},\n id = {6479626184},\n discipline = {Computer and Information Science},\n address = {Reykjavik, Iceland},\n title = {A Large Opinion Corpus in Portuguese - Tackling Out-Of-Vocabulary Words},\n deletionpending = {0},\n version = {1411229447},\n type = {Conference Proceedings},\n url_mendeley = {http://www.mendeley.com/research/large-opinion-corpus-portuguese-tackling-outofvocabulary-words/},\n isread = {0},\n author = {Nathan {Hartmann} and Lucas {Avanço} and Pedro {Balage} and Magali {Duran} and Maria das Graças Volpe {Nunes} and Thiago {Pardo} and Sandra {Aluísio}},\n series = {Proceedings of the 9th edition of the Language Resources and Evaluation Conference},\n pages = {3865-3871},\n url_0 = {http://www.lrec-conf.org/proceedings/lrec2014/pdf/413_Paper.pdf},\n modified = {1411229447},\n citation_key = {Hartmann2014},\n booktitle = {Proceedings of the 9th edition of the Language Resources and Evaluation Conference},\n subdiscipline = {Artificial Intelligence}\n}","author_short":["Hartmann, N.","Avanço, L.","Balage, P.","Duran, M.","das Graças Volpe<nbsp>Nunes, M.","Pardo, T.","Aluísio, S."],"author":["Hartmann, Nathan","Avanço, Lucas","Balage, Pedro","Duran, Magali","das Graças Volpe Nunes, Maria","Pardo, Thiago","Aluísio, Sandra"],"address":"Reykjavik, Iceland","added":"1391370271","abstract":"Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes."},"bibtype":"inproceedings","biburl":"http://bibbase.org/mendeley2/11075481","creationDate":"2014-09-20T16:24:59.772Z","downloads":0,"keywords":[],"search_terms":["large","opinion","corpus","portuguese","tackling","out","vocabulary","words","hartmann","avanço","balage","duran","das graças volpe<nbsp>nunes","pardo","aluísio"],"title":"A Large Opinion Corpus in Portuguese - Tackling Out-Of-Vocabulary Words","year":2014,"dataSources":["5LLMrYw7CnzxiTKsS"]}