Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning

Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning. Zhang, Y., Liang, S., Ma, H., He, T., Wang, Q., Li, B., Xu, J., Zhang, G., Liu, X., & Xiong, C. Earth System Science Data, 15(5):2055–2079, May, 2023.

Paper doi abstract bibtex

Abstract. Motivated by the lack of long-term global soil moisture products with both high spatial and temporal resolutions, a global 1 km daily spatiotemporally continuous soil moisture product (GLASS SM) was generated from 2000 to 2020 using an ensemble learning model (eXtreme Gradient Boosting – XGBoost). The model was developed by integrating multiple datasets, including albedo, land surface temperature, and leaf area index products from the Global Land Surface Satellite (GLASS) product suite, as well as the European reanalysis (ERA5-Land) soil moisture product, in situ soil moisture dataset from the International Soil Moisture Network (ISMN), and auxiliary datasets (Multi-Error-Removed Improved-Terrain (MERIT) DEM and Global gridded soil information (SoilGrids)). Given the relatively large-scale differences between point-scale in situ measurements and other datasets, the triple collocation (TC) method was adopted to select the representative soil moisture stations and their measurements for creating the training samples. To fully evaluate the model performance, three validation strategies were explored: random, site independent, and year independent. Results showed that although the XGBoost model achieved the highest accuracy on the random test samples, it was clearly a result of model overfitting. Meanwhile, training the model with representative stations selected by the TC method could considerably improve its performance for site- or year-independent test samples. The overall validation accuracy of the model trained using representative stations on the site-independent test samples, which was least likely to be overfitted, was a correlation coefficient (R) of 0.715 and root mean square error (RMSE) of 0.079 m3 m−3. Moreover, compared to the model developed without station filtering, the validation accuracies of the model trained with representative stations improved significantly for most stations, with the median R and unbiased RMSE (ubRMSE) of the model for each station increasing from 0.64 to 0.74 and decreasing from 0.055 to 0.052 m3 m−3, respectively. Further validation of the GLASS SM product across four independent soil moisture networks revealed its ability to capture the temporal dynamics of measured soil moisture (R=0.69–0.89; ubRMSE = 0.033–0.048 m3 m−3). Lastly, the intercomparison between the GLASS SM product and two global microwave soil moisture datasets – the 1 km Soil Moisture Active Passive/Sentinel-1 L2 Radiometer/Radar soil moisture product and the European Space Agency Climate Change Initiative combined soil moisture product at 0.25∘ – indicated that the derived product maintained a more complete spatial coverage and exhibited high spatiotemporal consistency with those two soil moisture products. The annual average GLASS SM dataset from 2000 to 2020 can be freely downloaded from https://doi.org/10.5281/zenodo.7172664 (Zhang et al., 2022a), and the complete product at daily scale is available at http://glass.umd.edu/soil_moisture/ (last access: 12 May 2023).

@article{zhang_generation_2023,
	title = {Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning},
	volume = {15},
	copyright = {https://creativecommons.org/licenses/by/4.0/},
	issn = {1866-3516},
	url = {https://essd.copernicus.org/articles/15/2055/2023/},
	doi = {10.5194/essd-15-2055-2023},
	abstract = {Abstract. Motivated by the lack of long-term global soil moisture products
with both high spatial and temporal resolutions, a global 1 km daily
spatiotemporally continuous soil moisture product (GLASS SM) was generated
from 2000 to 2020 using an ensemble learning model (eXtreme Gradient
Boosting – XGBoost). The model was developed by integrating multiple
datasets, including albedo, land surface temperature, and leaf area index
products from the Global Land Surface Satellite (GLASS) product suite, as
well as the European reanalysis (ERA5-Land) soil moisture product, in situ
soil moisture dataset from the International Soil Moisture Network (ISMN),
and auxiliary datasets (Multi-Error-Removed Improved-Terrain (MERIT) DEM and
Global gridded soil information (SoilGrids)). Given the relatively large-scale differences between point-scale
in situ measurements and other datasets, the triple collocation (TC) method
was adopted to select the representative soil moisture stations and their
measurements for creating the training samples. To fully evaluate the model
performance, three validation strategies were explored: random,
site independent, and year independent. Results showed that although the
XGBoost model achieved the highest accuracy on the random test samples, it
was clearly a result of model overfitting. Meanwhile, training the model
with representative stations selected by the TC method could considerably
improve its performance for site- or year-independent test samples. The
overall validation accuracy of the model trained using representative
stations on the site-independent test samples, which was least likely to be
overfitted, was a correlation coefficient (R) of 0.715 and root mean square
error (RMSE) of 0.079 m3 m−3. Moreover, compared to the model
developed without station filtering, the validation accuracies of the model
trained with representative stations improved significantly for most stations,
with the median R and unbiased RMSE (ubRMSE) of the model for each station
increasing from 0.64 to 0.74 and decreasing from 0.055 to 0.052 m3 m−3, respectively. Further validation of the GLASS SM product across
four independent soil moisture networks revealed its ability to capture the
temporal dynamics of measured soil moisture (R=0.69–0.89; ubRMSE =
0.033–0.048 m3 m−3). Lastly, the intercomparison between the
GLASS SM product and two global microwave soil moisture datasets – the 1 km
Soil Moisture Active Passive/Sentinel-1 L2 Radiometer/Radar soil moisture
product and the European Space Agency Climate Change Initiative combined
soil moisture product at 0.25∘ – indicated that the derived
product maintained a more complete spatial coverage and exhibited high
spatiotemporal consistency with those two soil moisture products. The annual
average GLASS SM dataset from 2000 to 2020 can be freely downloaded from
https://doi.org/10.5281/zenodo.7172664 (Zhang et al., 2022a),
and the complete product at daily scale is available at
http://glass.umd.edu/soil\_moisture/ (last access: 12 May 2023).},
	language = {en},
	number = {5},
	urldate = {2024-11-15},
	journal = {Earth System Science Data},
	author = {Zhang, Yufang and Liang, Shunlin and Ma, Han and He, Tao and Wang, Qian and Li, Bing and Xu, Jianglei and Zhang, Guodong and Liu, Xiaobang and Xiong, Changhao},
	month = may,
	year = {2023},
	pages = {2055--2079},
}

Downloads: 0

{"_id":"j3BcaKEC6GPa9i2qz","bibbaseid":"zhang-liang-ma-he-wang-li-xu-zhang-etal-generationofglobal1kmdailysoilmoistureproductfrom2000to2020usingensemblelearning-2023","author_short":["Zhang, Y.","Liang, S.","Ma, H.","He, T.","Wang, Q.","Li, B.","Xu, J.","Zhang, G.","Liu, X.","Xiong, C."],"bibdata":{"bibtype":"article","type":"article","title":"Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning","volume":"15","copyright":"https://creativecommons.org/licenses/by/4.0/","issn":"1866-3516","url":"https://essd.copernicus.org/articles/15/2055/2023/","doi":"10.5194/essd-15-2055-2023","abstract":"Abstract. Motivated by the lack of long-term global soil moisture products with both high spatial and temporal resolutions, a global 1 km daily spatiotemporally continuous soil moisture product (GLASS SM) was generated from 2000 to 2020 using an ensemble learning model (eXtreme Gradient Boosting – XGBoost). The model was developed by integrating multiple datasets, including albedo, land surface temperature, and leaf area index products from the Global Land Surface Satellite (GLASS) product suite, as well as the European reanalysis (ERA5-Land) soil moisture product, in situ soil moisture dataset from the International Soil Moisture Network (ISMN), and auxiliary datasets (Multi-Error-Removed Improved-Terrain (MERIT) DEM and Global gridded soil information (SoilGrids)). Given the relatively large-scale differences between point-scale in situ measurements and other datasets, the triple collocation (TC) method was adopted to select the representative soil moisture stations and their measurements for creating the training samples. To fully evaluate the model performance, three validation strategies were explored: random, site independent, and year independent. Results showed that although the XGBoost model achieved the highest accuracy on the random test samples, it was clearly a result of model overfitting. Meanwhile, training the model with representative stations selected by the TC method could considerably improve its performance for site- or year-independent test samples. The overall validation accuracy of the model trained using representative stations on the site-independent test samples, which was least likely to be overfitted, was a correlation coefficient (R) of 0.715 and root mean square error (RMSE) of 0.079 m3 m−3. Moreover, compared to the model developed without station filtering, the validation accuracies of the model trained with representative stations improved significantly for most stations, with the median R and unbiased RMSE (ubRMSE) of the model for each station increasing from 0.64 to 0.74 and decreasing from 0.055 to 0.052 m3 m−3, respectively. Further validation of the GLASS SM product across four independent soil moisture networks revealed its ability to capture the temporal dynamics of measured soil moisture (R=0.69–0.89; ubRMSE = 0.033–0.048 m3 m−3). Lastly, the intercomparison between the GLASS SM product and two global microwave soil moisture datasets – the 1 km Soil Moisture Active Passive/Sentinel-1 L2 Radiometer/Radar soil moisture product and the European Space Agency Climate Change Initiative combined soil moisture product at 0.25∘ – indicated that the derived product maintained a more complete spatial coverage and exhibited high spatiotemporal consistency with those two soil moisture products. The annual average GLASS SM dataset from 2000 to 2020 can be freely downloaded from https://doi.org/10.5281/zenodo.7172664 (Zhang et al., 2022a), and the complete product at daily scale is available at http://glass.umd.edu/soil_moisture/ (last access: 12 May 2023).","language":"en","number":"5","urldate":"2024-11-15","journal":"Earth System Science Data","author":[{"propositions":[],"lastnames":["Zhang"],"firstnames":["Yufang"],"suffixes":[]},{"propositions":[],"lastnames":["Liang"],"firstnames":["Shunlin"],"suffixes":[]},{"propositions":[],"lastnames":["Ma"],"firstnames":["Han"],"suffixes":[]},{"propositions":[],"lastnames":["He"],"firstnames":["Tao"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Qian"],"suffixes":[]},{"propositions":[],"lastnames":["Li"],"firstnames":["Bing"],"suffixes":[]},{"propositions":[],"lastnames":["Xu"],"firstnames":["Jianglei"],"suffixes":[]},{"propositions":[],"lastnames":["Zhang"],"firstnames":["Guodong"],"suffixes":[]},{"propositions":[],"lastnames":["Liu"],"firstnames":["Xiaobang"],"suffixes":[]},{"propositions":[],"lastnames":["Xiong"],"firstnames":["Changhao"],"suffixes":[]}],"month":"May","year":"2023","pages":"2055–2079","bibtex":"@article{zhang_generation_2023,\n\ttitle = {Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning},\n\tvolume = {15},\n\tcopyright = {https://creativecommons.org/licenses/by/4.0/},\n\tissn = {1866-3516},\n\turl = {https://essd.copernicus.org/articles/15/2055/2023/},\n\tdoi = {10.5194/essd-15-2055-2023},\n\tabstract = {Abstract. Motivated by the lack of long-term global soil moisture products\nwith both high spatial and temporal resolutions, a global 1 km daily\nspatiotemporally continuous soil moisture product (GLASS SM) was generated\nfrom 2000 to 2020 using an ensemble learning model (eXtreme Gradient\nBoosting – XGBoost). The model was developed by integrating multiple\ndatasets, including albedo, land surface temperature, and leaf area index\nproducts from the Global Land Surface Satellite (GLASS) product suite, as\nwell as the European reanalysis (ERA5-Land) soil moisture product, in situ\nsoil moisture dataset from the International Soil Moisture Network (ISMN),\nand auxiliary datasets (Multi-Error-Removed Improved-Terrain (MERIT) DEM and\nGlobal gridded soil information (SoilGrids)). Given the relatively large-scale differences between point-scale\nin situ measurements and other datasets, the triple collocation (TC) method\nwas adopted to select the representative soil moisture stations and their\nmeasurements for creating the training samples. To fully evaluate the model\nperformance, three validation strategies were explored: random,\nsite independent, and year independent. Results showed that although the\nXGBoost model achieved the highest accuracy on the random test samples, it\nwas clearly a result of model overfitting. Meanwhile, training the model\nwith representative stations selected by the TC method could considerably\nimprove its performance for site- or year-independent test samples. The\noverall validation accuracy of the model trained using representative\nstations on the site-independent test samples, which was least likely to be\noverfitted, was a correlation coefficient (R) of 0.715 and root mean square\nerror (RMSE) of 0.079 m3 m−3. Moreover, compared to the model\ndeveloped without station filtering, the validation accuracies of the model\ntrained with representative stations improved significantly for most stations,\nwith the median R and unbiased RMSE (ubRMSE) of the model for each station\nincreasing from 0.64 to 0.74 and decreasing from 0.055 to 0.052 m3 m−3, respectively. Further validation of the GLASS SM product across\nfour independent soil moisture networks revealed its ability to capture the\ntemporal dynamics of measured soil moisture (R=0.69–0.89; ubRMSE =\n0.033–0.048 m3 m−3). Lastly, the intercomparison between the\nGLASS SM product and two global microwave soil moisture datasets – the 1 km\nSoil Moisture Active Passive/Sentinel-1 L2 Radiometer/Radar soil moisture\nproduct and the European Space Agency Climate Change Initiative combined\nsoil moisture product at 0.25∘ – indicated that the derived\nproduct maintained a more complete spatial coverage and exhibited high\nspatiotemporal consistency with those two soil moisture products. The annual\naverage GLASS SM dataset from 2000 to 2020 can be freely downloaded from\nhttps://doi.org/10.5281/zenodo.7172664 (Zhang et al., 2022a),\nand the complete product at daily scale is available at\nhttp://glass.umd.edu/soil\\_moisture/ (last access: 12 May 2023).},\n\tlanguage = {en},\n\tnumber = {5},\n\turldate = {2024-11-15},\n\tjournal = {Earth System Science Data},\n\tauthor = {Zhang, Yufang and Liang, Shunlin and Ma, Han and He, Tao and Wang, Qian and Li, Bing and Xu, Jianglei and Zhang, Guodong and Liu, Xiaobang and Xiong, Changhao},\n\tmonth = may,\n\tyear = {2023},\n\tpages = {2055--2079},\n}\n\n\n\n\n\n\n\n","author_short":["Zhang, Y.","Liang, S.","Ma, H.","He, T.","Wang, Q.","Li, B.","Xu, J.","Zhang, G.","Liu, X.","Xiong, C."],"key":"zhang_generation_2023","id":"zhang_generation_2023","bibbaseid":"zhang-liang-ma-he-wang-li-xu-zhang-etal-generationofglobal1kmdailysoilmoistureproductfrom2000to2020usingensemblelearning-2023","role":"author","urls":{"Paper":"https://essd.copernicus.org/articles/15/2055/2023/"},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/zotero/tereno","dataSources":["cq3J5xX6zmBvc2TQC"],"keywords":[],"search_terms":["generation","global","daily","soil","moisture","product","2000","2020","using","ensemble","learning","zhang","liang","ma","he","wang","li","xu","zhang","liu","xiong"],"title":"Generation of global 1 km daily soil moisture product from 2000 to 2020 using ensemble learning","year":2023}