{"_id":"mSYgz83euhMy7kcGZ","bibbaseid":"gudivada-apon-ding-dataqualityconsiderationsforbigdataandmachinelearninggoingbeyonddatacleaningandtransformations-2017","downloads":0,"creationDate":"2017-06-12T17:47:57.115Z","title":"Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations","author_short":["Gudivada, V.","Apon, A.","Ding, J."],"year":2017,"bibtype":"article","biburl":"http://www.cs.ecu.edu/gudivada/bibbase-bibliography.bib","bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["V."],"propositions":[],"lastnames":["Gudivada"],"suffixes":[]},{"firstnames":["A."],"propositions":[],"lastnames":["Apon"],"suffixes":[]},{"firstnames":["J."],"propositions":[],"lastnames":["Ding"],"suffixes":[]}],"title":"Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations","journal":"The International Journal on Advances in Software","year":"2017","vol":"10","number":"1","abstract":"Data quality issues trace back their origin to the early days of computing. A wide range of domain-specific techniques to assess and improve the quality of data exist in the literature. These solutions primarily target data which resides in relational databases and data warehouses. The recent emergence of big data analytics and renaissance in machine learning necessitates evaluating the suitability relational database-centric approaches to data quality. In this paper, we describe the nature of the data quality issues in the context of big data and machine learning. We discuss facets of data quality, present a data governance-driven framework for data quality lifecycle for this new scenario, and describe an approach to its implementation. A sampling of the tools available for data quality management are indicated and future trends are discussed.","bibtex":"@article{Gudivada-2017-data-quality-considerations-for-big-data-and-machine-learning-going-beyond-data-cleaning-and-transformations,\n author = {V. Gudivada and A. Apon and J. Ding},\n title = {Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations},\n journal = {The International Journal on Advances in Software},\n year = {2017},\n vol = {10},\n number = {1},\n abstract = {Data quality issues trace back their origin to the early days of computing. A wide range of domain-specific techniques to assess and improve the quality of data exist in the literature. These solutions primarily target data which resides in relational databases and data warehouses. The recent emergence of big data analytics and renaissance in machine learning necessitates evaluating the suitability relational database-centric approaches to data quality. In this paper, we describe the nature of the data quality issues in the context of big data and machine learning. We discuss facets of data quality, present a data governance-driven framework for data quality lifecycle for this new scenario, and describe an approach to its implementation. A sampling of the tools available for data quality management are indicated and future trends are discussed.},\n}\n\n","author_short":["Gudivada, V.","Apon, A.","Ding, J."],"key":"Gudivada-2017-data-quality-considerations-for-big-data-and-machine-learning-going-beyond-data-cleaning-and-transformations","id":"Gudivada-2017-data-quality-considerations-for-big-data-and-machine-learning-going-beyond-data-cleaning-and-transformations","bibbaseid":"gudivada-apon-ding-dataqualityconsiderationsforbigdataandmachinelearninggoingbeyonddatacleaningandtransformations-2017","role":"author","urls":{},"downloads":0,"html":""},"search_terms":["data","quality","considerations","big","data","machine","learning","going","beyond","data","cleaning","transformations","gudivada","apon","ding"],"keywords":[],"authorIDs":["593ed3cd524d013b4a000034","5dfa24181c2129de01000078","5dfee5185dd8e7df0100004a","5e25d084f299d4de0100022d","5e25e435a6f19fde01000108","5e2d8013556d50df010000a7","5e2e0f40524f94de010000c9"],"dataSources":["PmmMaYA5oCGySfyg2"]}