Distributed Framework for Automating Opinion Discretization from Text Corpora on Facebook

Distributed Framework for Automating Opinion Discretization from Text Corpora on Facebook. Xuan Huynh, H., Nguyen, V. T., Duong-Trung, N., Pham, V. H., & Phan, C. T. IEEE Access, 7:78675–78684, 2019. Publisher: Institute of Electrical and Electronics Engineers Inc.

Paper doi abstract bibtex

Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: Acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.

@article{XuanHuynh2019,
	title = {Distributed {Framework} for {Automating} {Opinion} {Discretization} from {Text} {Corpora} on {Facebook}},
	volume = {7},
	issn = {21693536},
	url = {http://www.scopus.com/inward/record.url?eid=2-s2.0-85068208355%7B%5C&%7DpartnerID=MN8TOARS},
	doi = {10.1109/ACCESS.2019.2922427},
	abstract = {Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: Acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99\% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.},
	journal = {IEEE Access},
	author = {Xuan Huynh, Hiep and Nguyen, Vu Tuan and Duong-Trung, Nghia and Pham, Van Huy and Phan, Cang Thuong},
	year = {2019},
	note = {Publisher: Institute of Electrical and Electronics Engineers Inc.},
	keywords = {Apache spark, TensorFlow, classification, convolutional neural networks, deep learning, opinion mining},
	pages = {78675--78684},
}

Downloads: 0

{"_id":"G2x7rMn2KGHbCRGcP","bibbaseid":"xuanhuynh-nguyen-duongtrung-pham-phan-distributedframeworkforautomatingopiniondiscretizationfromtextcorporaonfacebook-2019","author_short":["Xuan Huynh, H.","Nguyen, V. T.","Duong-Trung, N.","Pham, V. H.","Phan, C. T."],"bibdata":{"bibtype":"article","type":"article","title":"Distributed Framework for Automating Opinion Discretization from Text Corpora on Facebook","volume":"7","issn":"21693536","url":"http://www.scopus.com/inward/record.url?eid=2-s2.0-85068208355%7B%5C&%7DpartnerID=MN8TOARS","doi":"10.1109/ACCESS.2019.2922427","abstract":"Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: Acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.","journal":"IEEE Access","author":[{"propositions":[],"lastnames":["Xuan","Huynh"],"firstnames":["Hiep"],"suffixes":[]},{"propositions":[],"lastnames":["Nguyen"],"firstnames":["Vu","Tuan"],"suffixes":[]},{"propositions":[],"lastnames":["Duong-Trung"],"firstnames":["Nghia"],"suffixes":[]},{"propositions":[],"lastnames":["Pham"],"firstnames":["Van","Huy"],"suffixes":[]},{"propositions":[],"lastnames":["Phan"],"firstnames":["Cang","Thuong"],"suffixes":[]}],"year":"2019","note":"Publisher: Institute of Electrical and Electronics Engineers Inc.","keywords":"Apache spark, TensorFlow, classification, convolutional neural networks, deep learning, opinion mining","pages":"78675–78684","bibtex":"@article{XuanHuynh2019,\n\ttitle = {Distributed {Framework} for {Automating} {Opinion} {Discretization} from {Text} {Corpora} on {Facebook}},\n\tvolume = {7},\n\tissn = {21693536},\n\turl = {http://www.scopus.com/inward/record.url?eid=2-s2.0-85068208355%7B%5C&%7DpartnerID=MN8TOARS},\n\tdoi = {10.1109/ACCESS.2019.2922427},\n\tabstract = {Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: Acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99\\% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.},\n\tjournal = {IEEE Access},\n\tauthor = {Xuan Huynh, Hiep and Nguyen, Vu Tuan and Duong-Trung, Nghia and Pham, Van Huy and Phan, Cang Thuong},\n\tyear = {2019},\n\tnote = {Publisher: Institute of Electrical and Electronics Engineers Inc.},\n\tkeywords = {Apache spark, TensorFlow, classification, convolutional neural networks, deep learning, opinion mining},\n\tpages = {78675--78684},\n}\n\n","author_short":["Xuan Huynh, H.","Nguyen, V. T.","Duong-Trung, N.","Pham, V. H.","Phan, C. T."],"key":"XuanHuynh2019","id":"XuanHuynh2019","bibbaseid":"xuanhuynh-nguyen-duongtrung-pham-phan-distributedframeworkforautomatingopiniondiscretizationfromtextcorporaonfacebook-2019","role":"author","urls":{"Paper":"http://www.scopus.com/inward/record.url?eid=2-s2.0-85068208355%7B%5C&%7DpartnerID=MN8TOARS"},"keyword":["Apache spark","TensorFlow","classification","convolutional neural networks","deep learning","opinion mining"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/groups/2168152/items?key=VCdsaROd5deDY3prqqG8kI0c&format=bibtex&limit=100","dataSources":["syJjwTDDM32TsM2iF","QwrFbRJvXF69SEShv","HbngRCZLbLed2q9QT","LtEFvT85hYpNg4Esp","iHfnnAr7wKJJxkNMt","PrvBTxn4Zgeep29e5","78Yd9ZHcx783Wkffe","SKRhTA7ok4L4waPkZ","GfrMfnKTkYdcYTRsy","RqqCdXGEyWH4dZ76k","cbiwaQPQJSZeJDDY9","2Jak7xK39ytqcgqQ4","CDfDBPD6CDScj6Ty4","WgiCycoQjRx6KArBy","KBdipwowTNXWiKqYd","yjd6eECyb3TYZpZ3R","D9jmZ7aoHfJnYQ4ES","R8dLFAvyQ2oFRijDJ","dc6SzEK4S9LfC3XpA","kGWABmrDfhF29uibh","YE9GesxGLCsBc3vvC","v3qfuosZ66nvD85FK","BSxBG5ms26R2teZn9"],"keywords":["apache spark","tensorflow","classification","convolutional neural networks","deep learning","opinion mining"],"search_terms":["distributed","framework","automating","opinion","discretization","text","corpora","facebook","xuan huynh","nguyen","duong-trung","pham","phan"],"title":"Distributed Framework for Automating Opinion Discretization from Text Corpora on Facebook","year":2019}