Optimizing landslide susceptibility mapping using machine learning and geospatial techniques. Agboola, G., Beni, L. H., Elbayoumi, T., & Thompson, G. Ecological Informatics, 81:102583, July, 2024.
Optimizing landslide susceptibility mapping using machine learning and geospatial techniques [link]Paper  doi  abstract   bibtex   
Landslides present a substantial risk to human lives, the environment, and infrastructure. Consequently, it is crucial to highlight the regions prone to future landslides by examining the correlation between past landslides and various geo-environmental factors. This study aims to investigate the optimal data selection and machine learning model, or ensemble technique, for evaluating the vulnerability of areas to landslides and determining the most accurate approach. To attain our objectives, we considered two different scenarios for selecting landslide-free random points (a slope threshold and a buffer-based approach) and performed a comparative analysis of five machine learning models for landslide susceptibility mapping, namely: Support Vector Machine (SVM), Logistic Regression (LR), Linear Discriminant Analysis (LDA), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The study area for this research is an area in Polk County in Western North Carolina that has experienced fatal landslides, leading to casualties and significant damage to infrastructure, properties, and road networks. The model construction process involves the utilization of a dataset comprising 1215 historical landslide occurrences and 1215 non-landslide points. We integrated a total of fourteen geospatial data layers, consisting of topographic variables, soil data, geological data, and land cover attributes. We use various metrics to assess the models' performance, including accuracy, F1-score, Kappa score, and AUC-ROC. In addition, we used the seeded-cell area index (SCAI) to evaluate map consistency. The ensemble of the five models using Weighted Average produces outstanding results, with an AUC-ROC of 99.4% for the slope threshold scenario and 91.8% for the buffer-based scenario. Our findings emphasize the significant impact of non-landslide random sampling on model performance in landslide susceptibility mapping. Furthermore, by optimally identifying landslide-prone regions and hotspots that need urgent risk management and land use planning, our study demonstrates the effectiveness of machine learning models in analyzing landslide susceptibility and providing valuable insights for informed decision-making and disaster risk reduction initiatives.
@article{agboola_optimizing_2024,
	title = {Optimizing landslide susceptibility mapping using machine learning and geospatial techniques},
	volume = {81},
	issn = {1574-9541},
	url = {https://www.sciencedirect.com/science/article/pii/S1574954124001250},
	doi = {10.1016/j.ecoinf.2024.102583},
	abstract = {Landslides present a substantial risk to human lives, the environment, and infrastructure. Consequently, it is crucial to highlight the regions prone to future landslides by examining the correlation between past landslides and various geo-environmental factors. This study aims to investigate the optimal data selection and machine learning model, or ensemble technique, for evaluating the vulnerability of areas to landslides and determining the most accurate approach. To attain our objectives, we considered two different scenarios for selecting landslide-free random points (a slope threshold and a buffer-based approach) and performed a comparative analysis of five machine learning models for landslide susceptibility mapping, namely: Support Vector Machine (SVM), Logistic Regression (LR), Linear Discriminant Analysis (LDA), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The study area for this research is an area in Polk County in Western North Carolina that has experienced fatal landslides, leading to casualties and significant damage to infrastructure, properties, and road networks. The model construction process involves the utilization of a dataset comprising 1215 historical landslide occurrences and 1215 non-landslide points. We integrated a total of fourteen geospatial data layers, consisting of topographic variables, soil data, geological data, and land cover attributes. We use various metrics to assess the models' performance, including accuracy, F1-score, Kappa score, and AUC-ROC. In addition, we used the seeded-cell area index (SCAI) to evaluate map consistency. The ensemble of the five models using Weighted Average produces outstanding results, with an AUC-ROC of 99.4\% for the slope threshold scenario and 91.8\% for the buffer-based scenario. Our findings emphasize the significant impact of non-landslide random sampling on model performance in landslide susceptibility mapping. Furthermore, by optimally identifying landslide-prone regions and hotspots that need urgent risk management and land use planning, our study demonstrates the effectiveness of machine learning models in analyzing landslide susceptibility and providing valuable insights for informed decision-making and disaster risk reduction initiatives.},
	urldate = {2024-04-24},
	journal = {Ecological Informatics},
	author = {Agboola, Gazali and Beni, Leila Hashemi and Elbayoumi, Tamer and Thompson, Gary},
	month = jul,
	year = {2024},
	keywords = {NALCMS},
	pages = {102583},
}

Downloads: 0