A forest-based algorithm for selecting informative variables using Variable Depth Distribution. Voronov, S., Jung, D., & Frisk, E. Engineering Applications of Artificial Intelligence, 97:104073, January, 2021.
A forest-based algorithm for selecting informative variables using Variable Depth Distribution [link]Paper  doi  abstract   bibtex   
Predictive maintenance of systems and their components in technical systems is a promising approach to optimize system usage and reduce system downtime. Various sensor data are logged during system operation for different purposes, but sometimes not directly related to the degradation of a specific component. Variable selection algorithms are necessary to reduce model complexity and improve interpretability of diagnostic and prognostic algorithms. This paper presents a forest-based variable selection algorithm that analyzes the distribution of a variable in the decision tree structure, called Variable Depth Distribution, to measure its importance. The proposed variable selection algorithm is developed for datasets with correlated variables that pose problems for existing forest-based variable selection methods. The proposed variable selection method is evaluated and analyzed using three case studies: survival analysis of lead–acid batteries in heavy-duty vehicles, engine misfire detection, and a simulated prognostics dataset. The results show the usefulness of the proposed algorithm, with respect to existing forest-based methods, and its ability to identify important variables in different applications. As an example, the battery prognostics case study shows that similar predictive performance is achieved when only 17% percent of the variables are used compared to all measured signals.
@article{voronov_forest-based_2021,
	title = {A forest-based algorithm for selecting informative variables using {Variable} {Depth} {Distribution}},
	volume = {97},
	issn = {0952-1976},
	url = {http://www.sciencedirect.com/science/article/pii/S0952197620303341},
	doi = {10.1016/j.engappai.2020.104073},
	abstract = {Predictive maintenance of systems and their components in technical systems is a promising approach to optimize system usage and reduce system downtime. Various sensor data are logged during system operation for different purposes, but sometimes not directly related to the degradation of a specific component. Variable selection algorithms are necessary to reduce model complexity and improve interpretability of diagnostic and prognostic algorithms. This paper presents a forest-based variable selection algorithm that analyzes the distribution of a variable in the decision tree structure, called Variable Depth Distribution, to measure its importance. The proposed variable selection algorithm is developed for datasets with correlated variables that pose problems for existing forest-based variable selection methods. The proposed variable selection method is evaluated and analyzed using three case studies: survival analysis of lead–acid batteries in heavy-duty vehicles, engine misfire detection, and a simulated prognostics dataset. The results show the usefulness of the proposed algorithm, with respect to existing forest-based methods, and its ability to identify important variables in different applications. As an example, the battery prognostics case study shows that similar predictive performance is achieved when only 17\% percent of the variables are used compared to all measured signals.},
	language = {en},
	urldate = {2020-11-30},
	journal = {Engineering Applications of Artificial Intelligence},
	author = {Voronov, Sergii and Jung, Daniel and Frisk, Erik},
	month = jan,
	year = {2021},
	keywords = {Automotive, Random Forest, Random Survival Forest, Variable selection},
	pages = {104073},
}

Downloads: 0