\n \n \n
\n
\n\n \n \n \n \n \n Less is More: Multimodal Region Representation via Pairwise Inter-view Learning.\n \n \n \n\n\n \n Namgung, M.; Lin, Y.; Lee, J.; and Chiang, Y.\n\n\n \n\n\n\n
arXiv preprint arXiv:2505.18178. 2025.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{namgung2025less,\n title={Less is More: Multimodal Region Representation via Pairwise Inter-view Learning},\n author={Namgung, Min and Lin, Yijun and Lee, JangHyeon and Chiang, Yao-Yi},\n journal={arXiv preprint arXiv:2505.18178},\n year={2025}\n}\n\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n MapQA: Open-domain Geospatial Question Answering on Map Data.\n \n \n \n \n\n\n \n Li, Z.; Grossman, M.; Qasemi, E.; Kulkarni, M.; Chen, M.; and Chiang, Y.\n\n\n \n\n\n\n
arXiv preprint arXiv:2503.07871. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{li2025mapqa,\n title={MapQA: Open-domain Geospatial Question Answering on Map Data},\n author={Li, Zekun and Grossman, Malcolm and Qasemi, Eric and Kulkarni, Mihir and Chen, Muhao and Chiang, Yao-Yi},\n journal={arXiv preprint arXiv:2503.07871},\n year={2025},\n url={https://arxiv.org/abs/2503.07871},\n abstract={Geospatial question answering (QA) is a fundamental task in navigation and point of interest (POI) searches. While existing geospatial QA datasets exist, they are limited in both scale and diversity, often relying solely on textual descriptions of geo-entities without considering their geometries. A major challenge in scaling geospatial QA datasets for reasoning lies in the complexity of geospatial relationships, which require integrating spatial structures, topological dependencies, and multi-hop reasoning capabilities that most text-based QA datasets lack. To address these limitations, we introduce MapQA, a novel dataset that not only provides question-answer pairs but also includes the geometries of geo-entities referenced in the questions. MapQA is constructed using SQL query templates to extract question-answer pairs from OpenStreetMap (OSM) for two study regions: Southern California and Illinois. It consists of 3,154 QA pairs spanning nine question types that require geospatial reasoning, such as neighborhood inference and geo-entity type identification. Compared to existing datasets, MapQA expands both the number and diversity of geospatial question types. We explore two approaches to tackle this challenge: (1) a retrieval-based language model that ranks candidate geo-entities by embedding similarity, and (2) a large language model (LLM) that generates SQL queries from natural language questions and geo-entity attributes, which are then executed against an OSM database. Our findings indicate that retrieval-based methods effectively capture concepts like closeness and direction but struggle with questions that require explicit computations (e.g., distance calculations). LLMs (e.g., GPT and Gemini) excel at generating SQL queries for one-hop reasoning but face challenges with multi-hop reasoning, highlighting a key bottleneck in advancing geospatial QA systems.}\n}\n\n\n
\n\n\n
\n Geospatial question answering (QA) is a fundamental task in navigation and point of interest (POI) searches. While existing geospatial QA datasets exist, they are limited in both scale and diversity, often relying solely on textual descriptions of geo-entities without considering their geometries. A major challenge in scaling geospatial QA datasets for reasoning lies in the complexity of geospatial relationships, which require integrating spatial structures, topological dependencies, and multi-hop reasoning capabilities that most text-based QA datasets lack. To address these limitations, we introduce MapQA, a novel dataset that not only provides question-answer pairs but also includes the geometries of geo-entities referenced in the questions. MapQA is constructed using SQL query templates to extract question-answer pairs from OpenStreetMap (OSM) for two study regions: Southern California and Illinois. It consists of 3,154 QA pairs spanning nine question types that require geospatial reasoning, such as neighborhood inference and geo-entity type identification. Compared to existing datasets, MapQA expands both the number and diversity of geospatial question types. We explore two approaches to tackle this challenge: (1) a retrieval-based language model that ranks candidate geo-entities by embedding similarity, and (2) a large language model (LLM) that generates SQL queries from natural language questions and geo-entity attributes, which are then executed against an OSM database. Our findings indicate that retrieval-based methods effectively capture concepts like closeness and direction but struggle with questions that require explicit computations (e.g., distance calculations). LLMs (e.g., GPT and Gemini) excel at generating SQL queries for one-hop reasoning but face challenges with multi-hop reasoning, highlighting a key bottleneck in advancing geospatial QA systems.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Towards the next generation of Geospatial Artificial Intelligence.\n \n \n \n \n\n\n \n Mai, G.; Xie, Y.; Jia, X.; Lao, N.; Rao, J.; Zhu, Q.; Liu, Z.; Chiang, Y.; and Jiao, J.\n\n\n \n\n\n\n
International Journal of Applied Earth Observation and Geoinformation, 136: 104368. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 5 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{MAI2025104368,\n title = {Towards the next generation of Geospatial Artificial Intelligence},\n journal = {International Journal of Applied Earth Observation and Geoinformation},\n volume = {136},\n pages = {104368},\n year = {2025},\n issn = {1569-8432},\n doi = {https://doi.org/10.1016/j.jag.2025.104368},\n url = {https://www.sciencedirect.com/science/article/pii/S1569843225000159},\n author = {Gengchen Mai and Yiqun Xie and Xiaowei Jia and Ni Lao and Jinmeng Rao and Qing Zhu and Zeping Liu and Yao-Yi Chiang and Junfeng Jiao},\n keywords = {Geospatial Artificial Intelligence, Heterogeneity-aware GeoAI, Knowledge-Guided GeoAI, Spatial representation learning, Geo-Foundation Models, Fairness-aware GeoAI, Privacy-aware GeoAI, Interpretable and explainable GeoAI},\n abstract = {Geospatial Artificial Intelligence (GeoAI), as the integration of geospatial studies and AI, has become one of the fastest-developing research directions in spatial data science and geography. This rapid change in the field calls for a deeper understanding of the recent developments and envision where the field is going in the near future. In this work, we provide a quantitative analysis of the GeoAI literature from the spatial, temporal, and semantic aspects. We briefly discuss the history of AI and GeoAI by highlighting some pioneering work. Then we discuss the current landscape of GeoAI by selecting five representative subdomains including remote sensing, urban computing, Earth system science, cartography, and geospatial semantics. Finally, we highlight several unique future research directions of GeoAI which are classified into two groups: GeoAI method development challenges and GeoAI Ethics challenges. Topics include heterogeneity-aware GeoAI, knowledge-guided GeoAI, spatial representation learning, geo-foundation models, fairness-aware GeoAI, privacy-aware GeoAI, as well as interpretable and explainable GeoAI. We hope our review of GeoAI’s past, present, and future is comprehensive and can enlighten the next generation of GeoAI research.}\n}\n\n\n
\n\n\n
\n Geospatial Artificial Intelligence (GeoAI), as the integration of geospatial studies and AI, has become one of the fastest-developing research directions in spatial data science and geography. This rapid change in the field calls for a deeper understanding of the recent developments and envision where the field is going in the near future. In this work, we provide a quantitative analysis of the GeoAI literature from the spatial, temporal, and semantic aspects. We briefly discuss the history of AI and GeoAI by highlighting some pioneering work. Then we discuss the current landscape of GeoAI by selecting five representative subdomains including remote sensing, urban computing, Earth system science, cartography, and geospatial semantics. Finally, we highlight several unique future research directions of GeoAI which are classified into two groups: GeoAI method development challenges and GeoAI Ethics challenges. Topics include heterogeneity-aware GeoAI, knowledge-guided GeoAI, spatial representation learning, geo-foundation models, fairness-aware GeoAI, privacy-aware GeoAI, as well as interpretable and explainable GeoAI. We hope our review of GeoAI’s past, present, and future is comprehensive and can enlighten the next generation of GeoAI research.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n HiCoTraj: Zero-Shot Demographic Reasoning via Hierarchical Chain-of-Thought Prompting from Trajectory.\n \n \n \n \n\n\n \n Xie, J.; Jiao, Y.; Kim, J.; Chiang, Y.; Zhao, L.; and Shafique, K.\n\n\n \n\n\n\n In
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Generative and Agentic AI for Multi-Modality Space-Time Intelligence, of
GeoGenAgent '25, pages 49–53, New York, NY, USA, 2025. Association for Computing Machinery\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{10.1145/3764915.3770723,\n author = {Xie, Junyi and Jiao, Yuankun and Kim, Jina and Chiang, Yao-Yi and Zhao, Lingyi and Shafique, Khurram},\n title = {HiCoTraj: Zero-Shot Demographic Reasoning via Hierarchical Chain-of-Thought Prompting from Trajectory},\n year = {2025},\n isbn = {9798400722615},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3764915.3770723},\n doi = {10.1145/3764915.3770723},\n abstract = {Inferring demographic attributes such as age, sex, or income level from human mobility patterns enables critical applications such as targeted public health interventions, equitable urban planning, and personalized transportation services. Existing mobility-based demographic inference studies heavily rely on large-scale trajectory data with demographic labels, leading to limited interpretability and poor generalizability across different datasets and user groups. We propose HiCoTraj (Zero-Shot Demographic Reasoning via Hierarchical Chain-of-Thought Prompting from Trajectory), a framework that leverages LLMs' zero-shot learning and semantic understanding capabilities to perform demographic inference without labeled training data. HiCoTraj transforms trajectories into semantically rich, natural language representations by creating detailed activity chronicles and multi-scale visiting summaries. Then HiCoTraj uses a novel hierarchical chain of thought reasoning to systematically guide LLMs through three cognitive stages: factual feature extraction, behavioral pattern analysis, and demographic inference with structured output. This approach addresses the scarcity challenge of labeled demographic data while providing transparent reasoning chains. Experimental evaluation on real-world trajectory data demonstrates that HiCoTraj achieves competitive performance across multiple demographic attributes in zero-shot scenarios.},\n booktitle = {Proceedings of the 1st ACM SIGSPATIAL International Workshop on Generative and Agentic AI for Multi-Modality Space-Time Intelligence},\n pages = {49–53},\n numpages = {5},\n keywords = {demographic inference, large language model reasoning, trajectory analysis, chain-of-thought prompting, zero-shot learning},\n location = {The Graduate Hotel Minneapolis, Minneapolis, MN, USA},\n series = {GeoGenAgent '25}\n}\n\n\n
\n\n\n
\n Inferring demographic attributes such as age, sex, or income level from human mobility patterns enables critical applications such as targeted public health interventions, equitable urban planning, and personalized transportation services. Existing mobility-based demographic inference studies heavily rely on large-scale trajectory data with demographic labels, leading to limited interpretability and poor generalizability across different datasets and user groups. We propose HiCoTraj (Zero-Shot Demographic Reasoning via Hierarchical Chain-of-Thought Prompting from Trajectory), a framework that leverages LLMs' zero-shot learning and semantic understanding capabilities to perform demographic inference without labeled training data. HiCoTraj transforms trajectories into semantically rich, natural language representations by creating detailed activity chronicles and multi-scale visiting summaries. Then HiCoTraj uses a novel hierarchical chain of thought reasoning to systematically guide LLMs through three cognitive stages: factual feature extraction, behavioral pattern analysis, and demographic inference with structured output. This approach addresses the scarcity challenge of labeled demographic data while providing transparent reasoning chains. Experimental evaluation on real-world trajectory data demonstrates that HiCoTraj achieves competitive performance across multiple demographic attributes in zero-shot scenarios.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning.\n \n \n \n \n\n\n \n Jelinski, N. A; Chiang, Y.; Nawrocki, T.; Macander, M.; Ives, S.; Grunwald, S.; Brungard, C.; Chen, T.; and Lin, Y.\n\n\n \n\n\n\n In
Proceedings of ACM SIGSPATIAL 2025, 2025. \n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{jelinski2025fine,\n title = {Fine-Scale Soil Mapping in Alaska with Multimodal Machine Learning},\n author = {Jelinski, Nicolas A and Chiang, Yao-Yi and Nawrocki, Timm and Macander, Matt and Ives, Sue and Grunwald, Sabine and Brungard, Colby and Chen, Theresa and Lin, Yijun},\n year = {2025},\n booktitle = {Proceedings of ACM SIGSPATIAL 2025},\n url = {https://sigspatial2025.sigspatial.org/application-accepted/}\n}\n\n\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery.\n \n \n \n \n\n\n \n Kim, J.; Jang, L.; Chiang, Y.; Wang, G.; and Pasco, M.\n\n\n \n\n\n\n
arXiv preprint arXiv:2506.14670. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{kim2025streetlens,\n title = {StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery},\n author = {Kim, Jina and Jang, Leeje and Chiang, Yao-Yi and Wang, Guanyu and Pasco, Michelle},\n journal = {arXiv preprint arXiv:2506.14670},\n year = {2025},\n doi = {10.48550/arXiv.2506.14670},\n url = {https://arxiv.org/abs/2506.14670}\n}\n\n\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Transit for All: Mapping Equitable Bike2Subway Connection using Region Representation Learning.\n \n \n \n \n\n\n \n Namgung, M.; Lee, J.; Ding, F.; and Chiang, Y.\n\n\n \n\n\n\n
arXiv preprint arXiv:2506.15113. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{namgung2025transit,\n title = {Transit for All: Mapping Equitable Bike2Subway Connection using Region Representation Learning},\n author = {Namgung, Min and Lee, JangHyeon and Ding, Fangyi and Chiang, Yao-Yi},\n journal = {arXiv preprint arXiv:2506.15113},\n year = {2025},\n doi = {10.48550/arXiv.2506.15113},\n url = {https://arxiv.org/abs/2506.15113}\n}\n\n\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n DIGMAPPER: A Modular System for Automated Geologic Map Digitization.\n \n \n \n \n\n\n \n Duan, W.; Gerlek, M.; Minton, S.; Knoblock, C.; Lin, F.; Chen, T.; Jang, L.; Kirsanova, S.; Li, Z.; Lin, Y.; and Chiang, Y.\n\n\n \n\n\n\n
arXiv preprint arXiv:2506.16006. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{duan2025digmapper,\n title = {DIGMAPPER: A Modular System for Automated Geologic Map Digitization},\n author = {Duan, Weiwei and Gerlek, Michael and Minton, Steven and Knoblock, Craig and Lin, Fandel and Chen, Theresa and Jang, Leeje and Kirsanova, Sofia and Li, Zekun and Lin, Yijun and Chiang, Yao-Yi},\n journal = {arXiv preprint arXiv:2506.16006},\n year = {2025},\n doi = {10.48550/arXiv.2506.16006},\n url = {https://arxiv.org/abs/2506.16006}\n}\n\n\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n GeoAnomaly Detection: Towards finding Needles of Anomalous Behavior in a Haystack of Geospatial Data.\n \n \n \n \n\n\n \n Chiang, Y.; Kim, J.; Krause, C.; Mattei, E.; Shafique, K.; Wenk, C.; and Züfle, A.\n\n\n \n\n\n\n
SIGSPATIAL Special, 15(1): 9–15. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{10.1145/3757932.3757935,\n title = {GeoAnomaly Detection: Towards finding Needles of Anomalous Behavior in a Haystack of Geospatial Data},\n author = {Chiang, Yao-Yi and Kim, Joon-Seok and Krause, Cory and Mattei, Enrico and Shafique, Khurram and Wenk, Carola and Z\\"{u}fle, Andreas},\n journal = {SIGSPATIAL Special},\n year = {2025},\n volume = {15},\n number = {1},\n pages = {9–15},\n numpages = {7},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3757932.3757935},\n doi = {10.1145/3757932.3757935},\n abstract = {The task of anomaly detection in geospatial data is to find data points in space and time that deviate so much from other observations as to arouse suspicion that it was generated by a different mechanism. Most of the existing work in geospatial data analysis relies on supervised machine learning, which involves training models on labeled datasets, allowing them to learn and make accurate predictions. Anomalies, by definition, are rare events that occur infrequently and unpredictably. As a result, labeled datasets for these events are either non-existent or extremely limited. Consequently, anomaly detection research calls for new strategies that can identify rare and unexpected events based on patterns and distributions within the data itself in an unsupervised fashion. These anomalies may manifest as unexpected changes in environmental conditions, natural disasters, unusual human activities, or irregular patterns in spatiotemporal distributions. To answer this call, we organized the 1st ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection (GeoAnomalies'24) at SIGSPATIAL'24. In this Newsletter Article, we report our first findings and map future research directions.}\n}\n\n\n
\n\n\n
\n The task of anomaly detection in geospatial data is to find data points in space and time that deviate so much from other observations as to arouse suspicion that it was generated by a different mechanism. Most of the existing work in geospatial data analysis relies on supervised machine learning, which involves training models on labeled datasets, allowing them to learn and make accurate predictions. Anomalies, by definition, are rare events that occur infrequently and unpredictably. As a result, labeled datasets for these events are either non-existent or extremely limited. Consequently, anomaly detection research calls for new strategies that can identify rare and unexpected events based on patterns and distributions within the data itself in an unsupervised fashion. These anomalies may manifest as unexpected changes in environmental conditions, natural disasters, unusual human activities, or irregular patterns in spatiotemporal distributions. To answer this call, we organized the 1st ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection (GeoAnomalies'24) at SIGSPATIAL'24. In this Newsletter Article, we report our first findings and map future research directions.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Augmenting Human-Centered Racial Covenant Detection and Georeferencing with Plug-and-Play NLP Pipelines.\n \n \n \n \n\n\n \n Pyo, J.; Jiao, Y.; Chiang, Y.; and Corey, M.\n\n\n \n\n\n\n
arXiv preprint arXiv:2509.05829. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{pyo2025augmenting,\n title = {Augmenting Human-Centered Racial Covenant Detection and Georeferencing with Plug-and-Play NLP Pipelines},\n author = {Pyo, Jiyoon and Jiao, Yuankun and Chiang, Yao-Yi and Corey, Michael},\n journal = {arXiv preprint arXiv:2509.05829},\n year = {2025},\n doi = {10.48550/arXiv.2509.05829},\n url = {https://arxiv.org/abs/2509.05829}\n}\n\n\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n LIGHT: Multi-modal Text Linking on Historical Maps.\n \n \n \n\n\n \n Lin, Y.; Olson, R.; Wu, J.; Chiang, Y.; and Weinman, J.\n\n\n \n\n\n\n In Yin, X.; Karatzas, D.; and Lopresti, D., editor(s),
Document Analysis and Recognition – ICDAR 2025, pages 60–77, Cham, 2025. Springer Nature Switzerland\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{10.1007/978-3-032-04617-8_4,\n title = {LIGHT: Multi-modal Text Linking on Historical Maps},\n author = {Lin, Yijun and Olson, Rhett and Wu, Junhan and Chiang, Yao-Yi and Weinman, Jerod},\n editor = {Yin, Xu-Cheng and Karatzas, Dimosthenis and Lopresti, Daniel},\n booktitle = {Document Analysis and Recognition -- ICDAR 2025},\n year = {2025},\n publisher = {Springer Nature Switzerland},\n address = {Cham},\n pages = {60--77},\n isbn = {978-3-032-04617-8},\n abstract = {Text on historical maps provides valuable information for studies in history, economics, geography, and other related fields. Unlike structured or semi-structured documents, text on maps varies significantly in orientation, reading order, shape, and placement. Many modern methods can detect and transcribe text regions, but they struggle to effectively ``link'' the recognized text fragments, e.g., determining a multi-word place name. Existing layout analysis methods model word relationships to improve text understanding in structured documents, but they primarily rely on linguistic features and neglect geometric information, which is essential for handling map text. To address these challenges, we propose LIGHT, a novel multi-modal approach that integrates linguistic, image, and geometric features for linking text on historical maps. In particular, LIGHT includes a geometry-aware embedding module that encodes the polygonal coordinates of text regions to capture polygon shapes and their relative spatial positions on an image. LIGHT unifies this geometric information with the visual and linguistic token embeddings from LayoutLMv3, a pretrained layout analysis model. LIGHT uses the cross-modal information to predict the reading-order successor of each text instance directly with a bi-directional learning strategy that enhances sequence robustness. Experimental results show that LIGHT outperforms existing methods on the ICDAR 2024/2025 MapText Competition data, demonstrating the effectiveness of multi-modal learning for historical map text linking.}\n}\n\n\n
\n\n\n
\n Text on historical maps provides valuable information for studies in history, economics, geography, and other related fields. Unlike structured or semi-structured documents, text on maps varies significantly in orientation, reading order, shape, and placement. Many modern methods can detect and transcribe text regions, but they struggle to effectively ``link'' the recognized text fragments, e.g., determining a multi-word place name. Existing layout analysis methods model word relationships to improve text understanding in structured documents, but they primarily rely on linguistic features and neglect geometric information, which is essential for handling map text. To address these challenges, we propose LIGHT, a novel multi-modal approach that integrates linguistic, image, and geometric features for linking text on historical maps. In particular, LIGHT includes a geometry-aware embedding module that encodes the polygonal coordinates of text regions to capture polygon shapes and their relative spatial positions on an image. LIGHT unifies this geometric information with the visual and linguistic token embeddings from LayoutLMv3, a pretrained layout analysis model. LIGHT uses the cross-modal information to predict the reading-order successor of each text instance directly with a bi-directional learning strategy that enhances sequence robustness. Experimental results show that LIGHT outperforms existing methods on the ICDAR 2024/2025 MapText Competition data, demonstrating the effectiveness of multi-modal learning for historical map text linking.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Exploiting LLMs and Semantic Technologies to Build a Knowledge Graph of Historical Mining Data.\n \n \n \n\n\n \n Knoblock, C. A.; Vu, B.; Shbita, B.; Chiang, Y.; Krishna, P. P.; Lin, X.; Muric, G.; Pyo, J.; Trejo-Sheu, A.; and Ye, M.\n\n\n \n\n\n\n In Garijo, D.; Kirrane, S.; Salatino, A.; Shimizu, C.; Acosta, M.; Nuzzolese, A. G.; Ferrada, S.; Soulard, T.; Kozaki, K.; Takeda, H.; and Gentile, A. L., editor(s),
The Semantic Web – ISWC 2025, pages 451–471, Cham, 2025. Springer Nature Switzerland\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 3 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{10.1007/978-3-032-09530-5_26,\n title = {Exploiting LLMs and Semantic Technologies to Build a Knowledge Graph of Historical Mining Data},\n author = {Knoblock, Craig A. and Vu, Binh and Shbita, Basel and Chiang, Yao-Yi and Krishna, Pothula Punith and Lin, Xiao and Muric, Goran and Pyo, Jiyoon and Trejo-Sheu, Adriana and Ye, Meng},\n editor = {Garijo, Daniel and Kirrane, Sabrina and Salatino, Angelo and Shimizu, Cogan and Acosta, Maribel and Nuzzolese, Andrea Giovanni and Ferrada, Sebasti{\\'a}n and Soulard, Thibaut and Kozaki, Kouji and Takeda, Hideaki and Gentile, Anna Lisa},\n booktitle = {The Semantic Web -- ISWC 2025},\n year = {2025},\n publisher = {Springer Nature Switzerland},\n address = {Cham},\n pages = {451--471},\n isbn = {978-3-032-09530-5},\n abstract = {Locating new sources of critical minerals begins with understanding where these minerals have been found in the past. However, historical data about mineral occurrences is often locked in disparate, unstructured, and inconsistent formats, ranging from government databases to mining reports and journal articles. To address this challenge, we have developed a set of scalable technologies that extract, normalize, and semantically integrate information from these sources into a unified knowledge graph. Our approach combines ontology-driven modeling, large-language models for information extraction and classification, and tools for linking and validating data across sources. The result is a semantically enriched, queryable knowledge graph that supports reproducible analysis, expert validation, and geoscientific applications such as deposit classification and prospectivity modeling. Through this work, we have successfully integrated information from hundreds of thousands of records across multiple historical sources to build one of the world's largest repositories of structured data on critical minerals.}\n}\n\n\n
\n\n\n
\n Locating new sources of critical minerals begins with understanding where these minerals have been found in the past. However, historical data about mineral occurrences is often locked in disparate, unstructured, and inconsistent formats, ranging from government databases to mining reports and journal articles. To address this challenge, we have developed a set of scalable technologies that extract, normalize, and semantically integrate information from these sources into a unified knowledge graph. Our approach combines ontology-driven modeling, large-language models for information extraction and classification, and tools for linking and validating data across sources. The result is a semantically enriched, queryable knowledge graph that supports reproducible analysis, expert validation, and geoscientific applications such as deposit classification and prospectivity modeling. Through this work, we have successfully integrated information from hundreds of thousands of records across multiple historical sources to build one of the world's largest repositories of structured data on critical minerals.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning.\n \n \n \n \n\n\n \n Kirsanova, S.; Duan, W.; and Chiang, Y.\n\n\n \n\n\n\n In
Proceedings of the 4th ACM SIGSPATIAL International Workshop on Searching and Mining Large Collections of Geospatial Data, of
GeoSearch '25, pages 35–38, New York, NY, USA, 2025. Association for Computing Machinery\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{10.1145/3764920.3770590,\n title = {Detecting Legend Items on Historical Maps Using GPT-4o with In-Context Learning},\n author = {Kirsanova, Sofia and Duan, Weiwei and Chiang, Yao-Yi},\n year = {2025},\n isbn = {9798400721830},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3764920.3770590},\n doi = {10.1145/3764920.3770590},\n booktitle = {Proceedings of the 4th ACM SIGSPATIAL International Workshop on Searching and Mining Large Collections of Geospatial Data},\n pages = {35–38},\n numpages = {4},\n keywords = {Map Digitization, Geospatial information retrieval, Map Layout Analysis, In-Context Learning},\n series = {GeoSearch '25},\n abstract = {Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding-box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88\\% F-1 and 85\\% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.}\n}\n\n\n
\n\n\n
\n Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding-box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88% F-1 and 85% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n BeSTAD: Behavior-Aware Spatio-Temporal Anomaly Detection for Human Mobility Data.\n \n \n \n \n\n\n \n Xie, J.; Kim, J.; Chiang, Y.; Zhao, L.; and Shafique, K.\n\n\n \n\n\n\n In
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection, of
GeoAnomalies '25, pages 56–59, New York, NY, USA, 2025. Association for Computing Machinery\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{10.1145/3764914.3770888,\n title = {BeSTAD: Behavior-Aware Spatio-Temporal Anomaly Detection for Human Mobility Data},\n author = {Xie, Junyi and Kim, Jina and Chiang, Yao-Yi and Zhao, Lingyi and Shafique, Khurram},\n year = {2025},\n isbn = {9798400722608},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3764914.3770888},\n doi = {10.1145/3764914.3770888},\n booktitle = {Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Geospatial Anomaly Detection},\n pages = {56–59},\n numpages = {4},\n keywords = {trajectory anomaly detection, mobility behavioral understanding, unsupervised learning},\n series = {GeoAnomalies '25},\n abstract = {Traditional anomaly detection in human mobility has primarily focused on trajectory-level analysis, identifying statistical outliers or spatiotemporal inconsistencies across aggregated movement traces. However, detecting individual-level anomalies, i.e., unusual deviations in a person's mobility behavior relative to their own historical patterns, within datasets encompassing large populations remains a significant challenge. In this paper, we present BeSTAD (Behavior-aware Spatio-Temporal Anomaly Detection for Human Mobility Data), an unsupervised framework that captures individualized behavioral signatures across large populations and uncovers fine-grained anomalies by jointly modeling spatial context and temporal dynamics. BeSTAD learns semantically enriched mobility representations that integrate location meaning and temporal patterns, enabling the detection of subtle deviations in individual movement behavior. BeSTAD further employs a behavior-cluster-aware modeling mechanism that builds personalized behavioral profiles from normal activity and identifies anomalies through cross-period behavioral comparison with consistent semantic alignment. Building on prior work in mobility behavior clustering, this approach enables not only the detection of behavioral shifts and deviations from established routines but also the identification of individuals exhibiting such changes within large-scale mobility datasets. By learning individual behaviors directly from unlabeled data, BeSTAD advances anomaly detection toward personalized and interpretable mobility analysis.}\n}\n\n\n
\n\n\n
\n Traditional anomaly detection in human mobility has primarily focused on trajectory-level analysis, identifying statistical outliers or spatiotemporal inconsistencies across aggregated movement traces. However, detecting individual-level anomalies, i.e., unusual deviations in a person's mobility behavior relative to their own historical patterns, within datasets encompassing large populations remains a significant challenge. In this paper, we present BeSTAD (Behavior-aware Spatio-Temporal Anomaly Detection for Human Mobility Data), an unsupervised framework that captures individualized behavioral signatures across large populations and uncovers fine-grained anomalies by jointly modeling spatial context and temporal dynamics. BeSTAD learns semantically enriched mobility representations that integrate location meaning and temporal patterns, enabling the detection of subtle deviations in individual movement behavior. BeSTAD further employs a behavior-cluster-aware modeling mechanism that builds personalized behavioral profiles from normal activity and identifies anomalies through cross-period behavioral comparison with consistent semantic alignment. Building on prior work in mobility behavior clustering, this approach enables not only the detection of behavioral shifts and deviations from established routines but also the identification of individuals exhibiting such changes within large-scale mobility datasets. By learning individual behaviors directly from unlabeled data, BeSTAD advances anomaly detection toward personalized and interpretable mobility analysis.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n CareWELL: Multimodal Region Representation Learning with Spatial Contexts for Urban Health.\n \n \n \n \n\n\n \n Namgung, M.; Chiang, Y.; and Omitaomu, O. A.\n\n\n \n\n\n\n In
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Advances in Urban-AI, of
UrbanAI '25, pages 27–36, New York, NY, USA, 2025. Association for Computing Machinery\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{10.1145/3764926.3771947,\n title = {CareWELL: Multimodal Region Representation Learning with Spatial Contexts for Urban Health},\n author = {Namgung, Min and Chiang, Yao-Yi and Omitaomu, Olufemi A.},\n year = {2025},\n isbn = {9798400721892},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3764926.3771947},\n doi = {10.1145/3764926.3771947},\n booktitle = {Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Advances in Urban-AI},\n pages = {27–36},\n numpages = {10},\n keywords = {Multimodal Learning, Region Representation Learning, Large Language Models},\n series = {UrbanAI '25},\n abstract = {Rapid urbanization affects living environments by intensifying exposure to air pollution, heat, noise, and urban dynamics, which together contribute to uneven health outcomes across neighborhoods. For instance, cardiovascular, respiratory, and mental health conditions are each influenced by distinct exposures such as air pollution, extreme temperatures, or limited access to green space. These heterogeneous patterns require understanding the characteristics of geographic regions in order to explain why urban health risks vary across urban areas. Recent work in self-supervised region representation learning provides a promising way to model such characteristics from multimodal geospatial data. However, existing methods face two major limitations: (i) they often depend on non-public datasets, limiting reproducibility and applicability, and (ii) their generic pretraining objectives overlook health-relevant determinants, including temporal variability in environmental exposures and inequalities in social conditions. To address these gaps, we propose Context-Aware Region rEpresentation with Weather, Environment, and Location Learning (CareWELL). CareWELL leverages large language models to encode seasonal variability in weather, employs contrastive learning to align geo-coordinate and weather representations, and introduces a context-aware objective that integrates socio-demographic factors while preserving spatial correlations. We evaluate CareWELL by predicting six urban health outcomes in Manhattan, New York City, and demonstrate that CareWELL consistently outperforms state-of-the-art baselines as well as a traditional spatial computing method. These results suggest the importance of context-aware pretraining objectives for learning health-relevant region representations.}\n}\n\n\n
\n\n\n
\n Rapid urbanization affects living environments by intensifying exposure to air pollution, heat, noise, and urban dynamics, which together contribute to uneven health outcomes across neighborhoods. For instance, cardiovascular, respiratory, and mental health conditions are each influenced by distinct exposures such as air pollution, extreme temperatures, or limited access to green space. These heterogeneous patterns require understanding the characteristics of geographic regions in order to explain why urban health risks vary across urban areas. Recent work in self-supervised region representation learning provides a promising way to model such characteristics from multimodal geospatial data. However, existing methods face two major limitations: (i) they often depend on non-public datasets, limiting reproducibility and applicability, and (ii) their generic pretraining objectives overlook health-relevant determinants, including temporal variability in environmental exposures and inequalities in social conditions. To address these gaps, we propose Context-Aware Region rEpresentation with Weather, Environment, and Location Learning (CareWELL). CareWELL leverages large language models to encode seasonal variability in weather, employs contrastive learning to align geo-coordinate and weather representations, and introduces a context-aware objective that integrates socio-demographic factors while preserving spatial correlations. We evaluate CareWELL by predicting six urban health outcomes in Manhattan, New York City, and demonstrate that CareWELL consistently outperforms state-of-the-art baselines as well as a traditional spatial computing method. These results suggest the importance of context-aware pretraining objectives for learning health-relevant region representations.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Region Context from Unifying Points, Lines, and Polygons.\n \n \n \n \n\n\n \n Kim, J.; and Chiang, Y.\n\n\n \n\n\n\n In
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Advances in Urban-AI, of
UrbanAI '25, pages 94–95, New York, NY, USA, 2025. Association for Computing Machinery\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{10.1145/3764926.3771941,\n title = {Region Context from Unifying Points, Lines, and Polygons},\n author = {Kim, Jina and Chiang, Yao-Yi},\n year = {2025},\n isbn = {9798400721892},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3764926.3771941},\n doi = {10.1145/3764926.3771941},\n booktitle = {Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Advances in Urban-AI},\n pages = {94–95},\n numpages = {2},\n keywords = {urban foundation models, region contextualization, spatial semantics},\n series = {UrbanAI '25},\n abstract = {We envision a new generation of urban foundation models that unify semantic, spatial, and topological relationships within and across point, line, and polygon features. Existing approaches typically encode different feature types separately and fuse them only at later stages, missing opportunities for truly integrated contextualization. To illustrate our vision, we present RegionContext, a framework that generates contextual region embeddings capturing both fine-grained spatial structures and higher-order semantics. Finally, we highlight key research directions for region contextualization in urban foundation models.}\n}\n\n\n
\n\n\n
\n We envision a new generation of urban foundation models that unify semantic, spatial, and topological relationships within and across point, line, and polygon features. Existing approaches typically encode different feature types separately and fuse them only at later stages, missing opportunities for truly integrated contextualization. To illustrate our vision, we present RegionContext, a framework that generates contextual region embeddings capturing both fine-grained spatial structures and higher-order semantics. Finally, we highlight key research directions for region contextualization in urban foundation models.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n WalkCLIP: Multimodal Learning for Urban Walkability Prediction.\n \n \n \n \n\n\n \n Xiang, S.; Lee, J.; Namgung, M.; and Chiang, Y.\n\n\n \n\n\n\n
arXiv preprint arXiv:2511.21947. 2025.\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{xiang2025walkclip,\n title = {WalkCLIP: Multimodal Learning for Urban Walkability Prediction},\n author = {Xiang, Shilong and Lee, JangHyeon and Namgung, Min and Chiang, Yao-Yi},\n journal = {arXiv preprint arXiv:2511.21947},\n year = {2025},\n doi = {10.48550/arXiv.2511.21947},\n url = {https://arxiv.org/abs/2511.21947}\n}\n\n\n\n\n
\n\n\n\n
\n\n\n\n\n\n