Semantization of Machine Learning and Data Science (a Project Idea). Alexiev, V. & Boytcheva, S. presentation, September, 2021.
Semantization of Machine Learning and Data Science (a Project Idea) [link]Paper  abstract   bibtex   
Problem: Data Science, AI & ML are expensive, and that's one of the reasons why relatively few enterprises use them. Goal: rationalize and industrialize DS efforts, and make them more reproducible and reusable. Approach: capture a lot of semantic info about all DS processes in an enterprise, and thus enable automation, discovery, reusability. The kinds of data we'd like to represent and integrate semantically (part of it is similar to what you can see on the Kaggle and OpenML sites): - Business context: goals, motivations, data value, value chain, cost vs benefit analysis, SWOT analysis... - DS challenges, where do they come from, datasets that can be leveraged to solve them - DS staff, expertise, projects, tasks, risks - DS/ML algorithms, implementations, modules, dependencies, software projects, versions, issue trackers - Cloud and IT resources: compute, storage; their deployment, management, automation... - ML model deployment, performance, model drift, retraining… Established software genres that cover parts of this landscape: - ModelOps (devOps for ML), Feature Spaces - Enterprise data catalogs (data hubs) vs data marketplaces vs open data catalogs vs EU Data Spaces and their metadata - FAIR data, reproducible research, Research Objects, research workflows, We've researched over 100 relevant ontologies that can be leveraged, covering - Organizations/enterprises, business plans, - Ontologies, semantic data, - DS challenges, datasets, statistical data, quality assessment - DS/ML approaches, software, projects, issues, - Data on research/science - Project management Focusing on DS/ML approaches only, a couple of the relevant ontologies or standards are: - PMML (predictive modeling markup language) - e-LICO, DMEX ontologies for describing DS - OntoDM, KDO ontologies for describing DS
@Misc{AlexievBoytcheva2021-SemantizationML,
  author       = {Vladimir Alexiev and Svetla Boytcheva},
  title        = {{Semantization of Machine Learning and Data Science (a Project Idea)}},
  howpublished = {presentation},
  month        = sep,
  year         = 2021,
  url          = {https://docs.google.com/presentation/d/1_8LSXa9vVzNwPE6Hjj4cKIJNRRBNz2wP/edit},
  keywords     = {Ontotext, research projects, knowledge graph, KG technologies, Semantization, Machine Learning, Data Science},
  address      = {Presentation at Big Dava Value Association Activity Group 45 (BDVA AG 45)},
  abstract     = {Problem: Data Science, AI & ML are expensive, and that's one of the reasons why relatively few enterprises use them.
Goal: rationalize and industrialize DS efforts, and make them more reproducible and reusable.
Approach: capture a lot of semantic info about all DS processes in an enterprise, and thus enable automation, discovery, reusability.
    
The kinds of data we'd like to represent and integrate semantically (part of it is similar to what you can see on the Kaggle and OpenML sites): 
- Business context: goals, motivations, data value, value chain, cost vs benefit analysis, SWOT analysis...
- DS challenges, where do they come from, datasets that can be leveraged to solve them 
- DS staff, expertise, projects, tasks, risks 
- DS/ML algorithms, implementations, modules, dependencies, software projects, versions, issue trackers 
- Cloud and IT resources: compute, storage; their deployment, management, automation...
- ML model deployment, performance, model drift, retraining… 

Established software genres that cover parts of this landscape: 
- ModelOps (devOps for ML), Feature Spaces 
- Enterprise data catalogs (data hubs) vs data marketplaces vs open data catalogs vs EU Data Spaces and their metadata 
- FAIR data, reproducible research, Research Objects, research workflows, 

We've researched over 100 relevant ontologies that can be leveraged, covering 
- Organizations/enterprises, business plans, 
- Ontologies, semantic data, 
- DS challenges, datasets, statistical data, quality assessment 
- DS/ML approaches, software, projects, issues, 
- Data on research/science 
- Project management 

Focusing on DS/ML approaches only, a couple of the relevant ontologies or standards are: 
- PMML (predictive modeling markup language) 
- e-LICO, DMEX ontologies for describing DS 
- OntoDM, KDO ontologies for describing DS},
}

Downloads: 0