Clarifying Influences of Sampling Bias (Concentration) and Locational Errors (Uncertainties) on Precision or Generality of Species Distribution Models. Hanberry, B. B. Land, 14(8):1620, August, 2025. Publisher: Multidisciplinary Digital Publishing Institute
Clarifying Influences of Sampling Bias (Concentration) and Locational Errors (Uncertainties) on Precision or Generality of Species Distribution Models [link]Paper  doi  abstract   bibtex   
Locational errors and sampling bias may produce unrepresentative species distribution models. To decompose the influence of errors, I modeled species distributions of 31 mammal species from georeferenced records and random samples from range maps, with potential sources of errors added or removed, using the random forests algorithm. Errors included the addition of (1) cities, (2) administrative centers, (3) records flagged as potential errors (e.g., outliers), and (4) urban records to range map samples; the removal of (5) flagged records and (6) urban records from georeferenced records; and the addition of (7) random points and (8) clustered points to georeferenced records. I also examined separation between thinned and unthinned (i.e., locally concentrated) records and ocean and land areas. Errors generally did not perturb species distributions, particularly if errors were located within species ranges. The greatest departure relative to unaltered models (mean niche overlap values of 0.96 out of 1) was due to the addition of administrative centers at a 13% error rate. Because locational errors overall do not occur in modern georeferenced records, outliers may provide important samples from undersampled areas. Delineating land from ocean coordinates may require a land layer at the highest available resolution and buffered to match the distance of locational uncertainty for georeferenced records. Predicted areas for species distributions increased along the spectrum of models from concentrated georeferenced records, thinned records, and random samples from range maps. Species distributions modeled with all georeferenced records will have the greatest sampling concentration (to differentiate from bias, because predictive modeling is not hypothesis testing), resulting in model locational precision, whereas species distribution models from random samples of range maps will have locational generality (rather than errors). The risk of removing samples of suitable conditions is the generation of unrepresentative models whereas the benefit of sample removal is slightly more generalized models, but which also may represent overpredictions.
@article{hanberry_clarifying_2025,
	title = {Clarifying {Influences} of {Sampling} {Bias} ({Concentration}) and {Locational} {Errors} ({Uncertainties}) on {Precision} or {Generality} of {Species} {Distribution} {Models}},
	volume = {14},
	copyright = {http://creativecommons.org/licenses/by/3.0/},
	issn = {2073-445X},
	url = {https://www.mdpi.com/2073-445X/14/8/1620},
	doi = {10.3390/land14081620},
	abstract = {Locational errors and sampling bias may produce unrepresentative species distribution models. To decompose the influence of errors, I modeled species distributions of 31 mammal species from georeferenced records and random samples from range maps, with potential sources of errors added or removed, using the random forests algorithm. Errors included the addition of (1) cities, (2) administrative centers, (3) records flagged as potential errors (e.g., outliers), and (4) urban records to range map samples; the removal of (5) flagged records and (6) urban records from georeferenced records; and the addition of (7) random points and (8) clustered points to georeferenced records. I also examined separation between thinned and unthinned (i.e., locally concentrated) records and ocean and land areas. Errors generally did not perturb species distributions, particularly if errors were located within species ranges. The greatest departure relative to unaltered models (mean niche overlap values of 0.96 out of 1) was due to the addition of administrative centers at a 13\% error rate. Because locational errors overall do not occur in modern georeferenced records, outliers may provide important samples from undersampled areas. Delineating land from ocean coordinates may require a land layer at the highest available resolution and buffered to match the distance of locational uncertainty for georeferenced records. Predicted areas for species distributions increased along the spectrum of models from concentrated georeferenced records, thinned records, and random samples from range maps. Species distributions modeled with all georeferenced records will have the greatest sampling concentration (to differentiate from bias, because predictive modeling is not hypothesis testing), resulting in model locational precision, whereas species distribution models from random samples of range maps will have locational generality (rather than errors). The risk of removing samples of suitable conditions is the generation of unrepresentative models whereas the benefit of sample removal is slightly more generalized models, but which also may represent overpredictions.},
	language = {en},
	number = {8},
	urldate = {2026-01-21},
	journal = {Land},
	author = {Hanberry, Brice B.},
	month = aug,
	year = {2025},
	note = {Publisher: Multidisciplinary Digital Publishing Institute},
	keywords = {NALCMS},
	pages = {1620},
}

Downloads: 0