Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development

Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development. Correia, J. L., Pereira, J. A., de Mello, R., Garcia, A., Fonseca, B., Ribeiro, M., Gheyi, R., Tiengo, W., Kalinowski, M., & Cerqueira, R. In Proceedings of the XIX Brazilian Symposium on Software Quality, SBQS'20, Brazil, December 1st - December 4th, pages 1-10, 2020.

Author version abstract bibtex 2 downloads

Data scientists often develop machine learning models to solve a variety of problems in the industry and academy. To build these models, these professionals usually perform activities that are also performed in the traditional software development lifecycle, such as eliciting and implementing requirements. One might argue that data scientists could rely on the engineering of traditional software development to build machine learning models. However, machine learning development presents certain characteristics, which may raise challenges that lead to the need for adopting new practices. The literature lacks in characterizing this knowledge from the perspective of the data scientists. In this paper, we characterize challenges and practices addressing the engineering of machine learning models that deserve attention from the research community. To this end, we performed a qualitative study with eight data scientists across five different companies having different levels of experience in developing machine learning models. Our findings suggest that: (i) data processing and feature engineering are the most challenging stages in the development of machine learning models; (ii) it is essential synergy between data scientists and domain experts in most of stages; and (iii) the development of machine learning models lacks the support of a well-engineered process.

@inproceedings{CorreiaEtAl20,
  author    = {Jo{\~a}o Lucas Correia and Juliana Alves Pereira and Rafael de Mello and Alessandro Garcia and Baldoino Fonseca and Marcio Ribeiro and Rohit Gheyi and Willy Tiengo and Marcos Kalinowski and Renato Cerqueira},
  title     = {Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development},
  abstract  = {Data scientists often develop machine learning models to solve a variety of problems in the industry and academy. To build these models, these professionals usually perform activities that are also performed in the traditional software development lifecycle, such as eliciting and implementing requirements. One might argue that data scientists could rely on the engineering of traditional software development to build machine learning models. However, machine learning development presents certain characteristics, which may raise challenges that lead to the need for adopting new practices. The literature lacks in characterizing this knowledge from the perspective of the data scientists. In this paper, we characterize challenges and practices addressing the engineering of machine learning models that deserve attention from the research community. To this end, we performed a qualitative study with eight data scientists across five different companies having different levels of experience in developing machine learning models. Our findings suggest that: (i) data processing and feature engineering are the most challenging stages in the development of machine learning models; (ii) it is essential synergy between data scientists and domain experts in most of stages; and (iii) the development of machine learning models lacks the support of a well-engineered process.},
  booktitle = {Proceedings of the XIX Brazilian Symposium on Software Quality, {SBQS'20}, Brazil, December 1st - December 4th},
  pages     = {1-10},
  note      = {},
  year      = {2020},
  urlAuthor_version       = {http://www.inf.puc-rio.br/~kalinowski/publications/CorreiaEtAl20.pdf},
  doi       = {},
}

Downloads: 2

{"_id":"3pGehk5HxLyqZZdJD","bibbaseid":"correia-pereira-demello-garcia-fonseca-ribeiro-gheyi-tiengo-etal-braziliandatascientistsrevealingtheirchallengesandpracticesonmachinelearningmodeldevelopment-2020","authorIDs":["2QsG9mfJnwX6MTuoJ"],"author_short":["Correia, J. L.","Pereira, J. A.","de Mello, R.","Garcia, A.","Fonseca, B.","Ribeiro, M.","Gheyi, R.","Tiengo, W.","Kalinowski, M.","Cerqueira, R."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["João","Lucas"],"propositions":[],"lastnames":["Correia"],"suffixes":[]},{"firstnames":["Juliana","Alves"],"propositions":[],"lastnames":["Pereira"],"suffixes":[]},{"firstnames":["Rafael"],"propositions":["de"],"lastnames":["Mello"],"suffixes":[]},{"firstnames":["Alessandro"],"propositions":[],"lastnames":["Garcia"],"suffixes":[]},{"firstnames":["Baldoino"],"propositions":[],"lastnames":["Fonseca"],"suffixes":[]},{"firstnames":["Marcio"],"propositions":[],"lastnames":["Ribeiro"],"suffixes":[]},{"firstnames":["Rohit"],"propositions":[],"lastnames":["Gheyi"],"suffixes":[]},{"firstnames":["Willy"],"propositions":[],"lastnames":["Tiengo"],"suffixes":[]},{"firstnames":["Marcos"],"propositions":[],"lastnames":["Kalinowski"],"suffixes":[]},{"firstnames":["Renato"],"propositions":[],"lastnames":["Cerqueira"],"suffixes":[]}],"title":"Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development","abstract":"Data scientists often develop machine learning models to solve a variety of problems in the industry and academy. To build these models, these professionals usually perform activities that are also performed in the traditional software development lifecycle, such as eliciting and implementing requirements. One might argue that data scientists could rely on the engineering of traditional software development to build machine learning models. However, machine learning development presents certain characteristics, which may raise challenges that lead to the need for adopting new practices. The literature lacks in characterizing this knowledge from the perspective of the data scientists. In this paper, we characterize challenges and practices addressing the engineering of machine learning models that deserve attention from the research community. To this end, we performed a qualitative study with eight data scientists across five different companies having different levels of experience in developing machine learning models. Our findings suggest that: (i) data processing and feature engineering are the most challenging stages in the development of machine learning models; (ii) it is essential synergy between data scientists and domain experts in most of stages; and (iii) the development of machine learning models lacks the support of a well-engineered process.","booktitle":"Proceedings of the XIX Brazilian Symposium on Software Quality, SBQS'20, Brazil, December 1st - December 4th","pages":"1-10","note":"","year":"2020","urlauthor_version":"http://www.inf.puc-rio.br/~kalinowski/publications/CorreiaEtAl20.pdf","doi":"","bibtex":"@inproceedings{CorreiaEtAl20,\r\n author = {Jo{\\~a}o Lucas Correia and Juliana Alves Pereira and Rafael de Mello and Alessandro Garcia and Baldoino Fonseca and Marcio Ribeiro and Rohit Gheyi and Willy Tiengo and Marcos Kalinowski and Renato Cerqueira},\r\n title = {Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development},\r\n abstract = {Data scientists often develop machine learning models to solve a variety of problems in the industry and academy. To build these models, these professionals usually perform activities that are also performed in the traditional software development lifecycle, such as eliciting and implementing requirements. One might argue that data scientists could rely on the engineering of traditional software development to build machine learning models. However, machine learning development presents certain characteristics, which may raise challenges that lead to the need for adopting new practices. The literature lacks in characterizing this knowledge from the perspective of the data scientists. In this paper, we characterize challenges and practices addressing the engineering of machine learning models that deserve attention from the research community. To this end, we performed a qualitative study with eight data scientists across five different companies having different levels of experience in developing machine learning models. Our findings suggest that: (i) data processing and feature engineering are the most challenging stages in the development of machine learning models; (ii) it is essential synergy between data scientists and domain experts in most of stages; and (iii) the development of machine learning models lacks the support of a well-engineered process.},\r\n booktitle = {Proceedings of the XIX Brazilian Symposium on Software Quality, {SBQS'20}, Brazil, December 1st - December 4th},\r\n pages = {1-10},\r\n note = {},\r\n year = {2020},\r\n urlAuthor_version = {http://www.inf.puc-rio.br/~kalinowski/publications/CorreiaEtAl20.pdf},\r\n doi = {},\r\n}\r\n\r\n","author_short":["Correia, J. L.","Pereira, J. A.","de Mello, R.","Garcia, A.","Fonseca, B.","Ribeiro, M.","Gheyi, R.","Tiengo, W.","Kalinowski, M.","Cerqueira, R."],"key":"CorreiaEtAl20","id":"CorreiaEtAl20","bibbaseid":"correia-pereira-demello-garcia-fonseca-ribeiro-gheyi-tiengo-etal-braziliandatascientistsrevealingtheirchallengesandpracticesonmachinelearningmodeldevelopment-2020","role":"author","urls":{"Author version":"http://www.inf.puc-rio.br/~kalinowski/publications/CorreiaEtAl20.pdf"},"metadata":{"authorlinks":{"kalinowski, m":"https://www-di.inf.puc-rio.br/~kalinowski/publications.html"}},"downloads":2},"bibtype":"inproceedings","biburl":"https://bibbase.org/network/files/KuRSiZJF8A6EZiujE","creationDate":"2020-11-10T23:47:58.701Z","downloads":2,"keywords":[],"search_terms":["brazilian","data","scientists","revealing","challenges","practices","machine","learning","model","development","correia","pereira","de mello","garcia","fonseca","ribeiro","gheyi","tiengo","kalinowski","cerqueira"],"title":"Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development","year":2020,"dataSources":["JhEx5LqjNuowkDTYw","FPdHx2YNMWt6KHbaS","oL8GbjE74fizfjkxY","Wbj3iHa4hGsGjEGJE","q7rgFjFgwoTSGkm3G","aKfxcyv7C9p9ytdpG","9pAzChfPy53GguqQk","B8Jierr7smZsGa7Jb","tvqztEQv84agmtPEB","56kphca3KPjtFZJC6","JxJm4GfaRAd3NEw2w","iSfhee4nHcHz4F2WQ"]}