Case-based Data Quality Management for IoT Logs: A Case Study Focusing on Detection of Data Quality Issues. Schultheis, A., Bertrand, Y., Grüger, J., Malburg, L., Bergmann, R., & Serral Asensio, E. IoT, 2025. doi abstract bibtex Smart manufacturing applications increasingly rely on time-series data from Industrial IoT sensors, yet these data streams often contain data quality issues (DQIs) that affect analysis and disrupt production. While traditional Machine Learning methods are difficult to apply due to the small amount of data available, the knowledge-based approach of Case-Based Reasoning (CBR) offers a way to reuse previously gained experience. We introduce the first end-to-end Case-Based Reasoning (CBR) framework that both detects and remedies DQIs in near real time, even when only a handful of annotated fault instances are available. Our solution encodes expert experience in the four CBR knowledge containers: (i) a vocabulary that represents sensor streams and their context in the DataStream format; (ii) a case base populated with fault-annotated event logs; (iii) tailored similarity measures—including a weighted Dynamic Time Warping variant and structure-aware list mapping—that isolate the signatures of missing-value, missing-sensor, and time-shift errors; and (iv) lightweight adaptation rules that recommend concrete repair actions or, where appropriate, invoke automated imputation and alignment routines. A case study is used to examine and present the suitability of the approach for a specific application domain. Although the case study demonstrates only limited capabilities in identifying Data Quality Issues (DQIs), we aim to support transparent evaluation and future research by publishing (1) a prototype of the Case-Based Reasoning (CBR) system and (2) a publicly accessible, meticulously annotated sensor-log benchmark. Together, these resources provide a reproducible baseline and a modular foundation for advancing similarity metrics, expanding the DQI taxonomy, and enabling knowledge-intensive reasoning in IoT data quality management.
@article{SchultheisBGMBSA2025,
author = {Schultheis, Alexander and Bertrand, Yannis and Grüger, Joscha and Malburg, Lukas and Bergmann, Ralph and {Serral Asensio}, Estefanía},
title = {{Case-based Data Quality Management for IoT Logs: A Case Study Focusing on Detection of Data Quality Issues}},
journal = {IoT},
year = {2025},
volume = {6},
number = {4},
article-number = {63},
doi = {10.3390/iot6040063},
abstract = {Smart manufacturing applications increasingly rely on time-series data from Industrial IoT sensors, yet these data streams often contain data quality issues (DQIs) that affect analysis and disrupt production. While traditional Machine Learning methods are difficult to apply due to the small amount of data available, the knowledge-based approach of Case-Based Reasoning (CBR) offers a way to reuse previously gained experience. We introduce the first end-to-end Case-Based Reasoning (CBR) framework that both detects and remedies DQIs in near real time, even when only a handful of annotated fault instances are available. Our solution encodes expert experience in the four CBR knowledge containers: (i) a vocabulary that represents sensor streams and their context in the DataStream format; (ii) a case base populated with fault-annotated event logs; (iii) tailored similarity measures—including a weighted Dynamic Time Warping variant and structure-aware list mapping—that isolate the signatures of missing-value, missing-sensor, and time-shift errors; and (iv) lightweight adaptation rules that recommend concrete repair actions or, where appropriate, invoke automated imputation and alignment routines. A case study is used to examine and present the suitability of the approach for a specific application domain. Although the case study demonstrates only limited capabilities in identifying Data Quality Issues (DQIs), we aim to support transparent evaluation and future research by publishing (1) a prototype of the Case-Based Reasoning (CBR) system and (2) a publicly accessible, meticulously annotated sensor-log benchmark. Together, these resources provide a reproducible baseline and a modular foundation for advancing similarity metrics, expanding the DQI taxonomy, and enabling knowledge-intensive reasoning in IoT data quality management.},
keywords = {Time Series Data, Industrial Internet of Things, Data Quality Issues, Temporal Case-Based Reasoning}
}
Downloads: 0
{"_id":"MH2FHB3sx6rX6JKbq","bibbaseid":"schultheis-bertrand-grger-malburg-bergmann-serralasensio-casebaseddataqualitymanagementforiotlogsacasestudyfocusingondetectionofdataqualityissues-2025","author_short":["Schultheis, A.","Bertrand, Y.","Grüger, J.","Malburg, L.","Bergmann, R.","Serral Asensio, E."],"bibdata":{"bibtype":"article","type":"article","author":[{"propositions":[],"lastnames":["Schultheis"],"firstnames":["Alexander"],"suffixes":[]},{"propositions":[],"lastnames":["Bertrand"],"firstnames":["Yannis"],"suffixes":[]},{"propositions":[],"lastnames":["Grüger"],"firstnames":["Joscha"],"suffixes":[]},{"propositions":[],"lastnames":["Malburg"],"firstnames":["Lukas"],"suffixes":[]},{"propositions":[],"lastnames":["Bergmann"],"firstnames":["Ralph"],"suffixes":[]},{"propositions":[],"lastnames":["Serral Asensio"],"firstnames":["Estefanía"],"suffixes":[]}],"title":"Case-based Data Quality Management for IoT Logs: A Case Study Focusing on Detection of Data Quality Issues","journal":"IoT","year":"2025","volume":"6","number":"4","article-number":"63","doi":"10.3390/iot6040063","abstract":"Smart manufacturing applications increasingly rely on time-series data from Industrial IoT sensors, yet these data streams often contain data quality issues (DQIs) that affect analysis and disrupt production. While traditional Machine Learning methods are difficult to apply due to the small amount of data available, the knowledge-based approach of Case-Based Reasoning (CBR) offers a way to reuse previously gained experience. We introduce the first end-to-end Case-Based Reasoning (CBR) framework that both detects and remedies DQIs in near real time, even when only a handful of annotated fault instances are available. Our solution encodes expert experience in the four CBR knowledge containers: (i) a vocabulary that represents sensor streams and their context in the DataStream format; (ii) a case base populated with fault-annotated event logs; (iii) tailored similarity measures—including a weighted Dynamic Time Warping variant and structure-aware list mapping—that isolate the signatures of missing-value, missing-sensor, and time-shift errors; and (iv) lightweight adaptation rules that recommend concrete repair actions or, where appropriate, invoke automated imputation and alignment routines. A case study is used to examine and present the suitability of the approach for a specific application domain. Although the case study demonstrates only limited capabilities in identifying Data Quality Issues (DQIs), we aim to support transparent evaluation and future research by publishing (1) a prototype of the Case-Based Reasoning (CBR) system and (2) a publicly accessible, meticulously annotated sensor-log benchmark. Together, these resources provide a reproducible baseline and a modular foundation for advancing similarity metrics, expanding the DQI taxonomy, and enabling knowledge-intensive reasoning in IoT data quality management.","keywords":"Time Series Data, Industrial Internet of Things, Data Quality Issues, Temporal Case-Based Reasoning","bibtex":"@article{SchultheisBGMBSA2025,\n author = {Schultheis, Alexander and Bertrand, Yannis and Grüger, Joscha and Malburg, Lukas and Bergmann, Ralph and {Serral Asensio}, Estefanía},\n title = {{Case-based Data Quality Management for IoT Logs: A Case Study Focusing on Detection of Data Quality Issues}},\n journal = {IoT},\n year \t\t = {2025},\n volume = {6},\n number = {4},\n article-number = {63},\n doi = {10.3390/iot6040063},\n abstract = {Smart manufacturing applications increasingly rely on time-series data from Industrial IoT sensors, yet these data streams often contain data quality issues (DQIs) that affect analysis and disrupt production. While traditional Machine Learning methods are difficult to apply due to the small amount of data available, the knowledge-based approach of Case-Based Reasoning (CBR) offers a way to reuse previously gained experience. We introduce the first end-to-end Case-Based Reasoning (CBR) framework that both detects and remedies DQIs in near real time, even when only a handful of annotated fault instances are available. Our solution encodes expert experience in the four CBR knowledge containers: (i) a vocabulary that represents sensor streams and their context in the DataStream format; (ii) a case base populated with fault-annotated event logs; (iii) tailored similarity measures—including a weighted Dynamic Time Warping variant and structure-aware list mapping—that isolate the signatures of missing-value, missing-sensor, and time-shift errors; and (iv) lightweight adaptation rules that recommend concrete repair actions or, where appropriate, invoke automated imputation and alignment routines. A case study is used to examine and present the suitability of the approach for a specific application domain. Although the case study demonstrates only limited capabilities in identifying Data Quality Issues (DQIs), we aim to support transparent evaluation and future research by publishing (1) a prototype of the Case-Based Reasoning (CBR) system and (2) a publicly accessible, meticulously annotated sensor-log benchmark. Together, these resources provide a reproducible baseline and a modular foundation for advancing similarity metrics, expanding the DQI taxonomy, and enabling knowledge-intensive reasoning in IoT data quality management.},\n keywords = {Time Series Data, Industrial Internet of Things, Data Quality Issues, Temporal Case-Based Reasoning}\n}\n\n","author_short":["Schultheis, A.","Bertrand, Y.","Grüger, J.","Malburg, L.","Bergmann, R.","Serral Asensio, E."],"key":"SchultheisBGMBSA2025","id":"SchultheisBGMBSA2025","bibbaseid":"schultheis-bertrand-grger-malburg-bergmann-serralasensio-casebaseddataqualitymanagementforiotlogsacasestudyfocusingondetectionofdataqualityissues-2025","role":"author","urls":{},"keyword":["Time Series Data","Industrial Internet of Things","Data Quality Issues","Temporal Case-Based Reasoning"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://web.wi2.uni-trier.de/publications/WI2Publikationen_IoT.bib","dataSources":["MSp3DzP4ToPojqkFy","J3orK6zvpR7d8vDmC","Td7BJ334QwxWK4vLW"],"keywords":["time series data","industrial internet of things","data quality issues","temporal case-based reasoning"],"search_terms":["case","based","data","quality","management","iot","logs","case","study","focusing","detection","data","quality","issues","schultheis","bertrand","grüger","malburg","bergmann","serral asensio"],"title":"Case-based Data Quality Management for IoT Logs: A Case Study Focusing on Detection of Data Quality Issues","year":2025}