1st Large Language Models for Spatial-rich Data Management (LLM+Spatial)
In conjunction with the 51st International Conference on Very Large Data Bases (VLDB)
London, United Kingdom - September 5, afternoon, 2025
In conjunction with the 51st International Conference on Very Large Data Bases (VLDB)
London, United Kingdom - September 5, afternoon, 2025
The importance of spatio-temporal data has increased significantly in various scientific fields, such as climate research, biodiversity, and the social sciences, primarily due to improvements in data collection and accessibility. Despite the opportunities for new scientific insight, researchers often face the challenge of inadequate tools and interfaces for managing, integrating, and analyzing spatio-temporal data. Recently, the emergent abilities of LLMs represent a pivotal point that is to significantly affect the academic and industrial communities. The vast amount of knowledge in spatial-rich data is not used to train and tune LLMs, and, spatio-temporal databases are not able to access and operate on the facts contained in the LLMs. This workshop aims to provide new insight into techniques from spatial-rich data and large language models to improve advances in spatial-rich data management and predictive models.
Nanjing University of Aeronautics and Astronatucis, P.R.China
jianqiu@nuaa.edu.cn
Nanyang Technological University, Singapore
c.long@ntu.edu.sg
University of Marburg, Germany
seeger@mathematik.uni-marburg.de
Beihang University, P.R.China
yxtong@buaa.edu.cn
The goal is to advance the understanding of how LLMs and spatial-rich data management can cooperatively contribute to novel data science solutions. Topics of interest include, but are not limited to:
September 5, 13:30-17:10, St James Room, 4th Floor
Time | Topic | Speakers |
---|---|---|
13:30-13:40 | Brief Introduction and Photo | Jianqiu Xu |
13:40-14:40 | Keynote 1: Spatial Data Systems in the LLM Era: 1+1=3? System Requirements and Research Opportunities | Walid G. Aref |
14:40-15:00 | Paper 1: NALMOBench: Towards Benchmarking Natural Language Interfaces for Moving Objects Databases | Xieyang Wang |
15:00-15:30 | coffee break | |
15:30-16:20 | Keynote 2: Natural Language Maps: Generative AI for Spatial Data Generation, Querying, and Visualization | Ahmed Eldawy |
16:20-17:10 | Keynote 3: Geospatial Entity Representation: A Step Towards City Foundation Models | Gao Cong |
Purdue University
Abstract: Large Language Models are replacing many components in data systems. Spatial data systems are no exception. Important questions arise: Do LLMs replace spatial data systems in their entireties or do we still need spatial data systems? How do we adapt spatial data systems to leverage LLMs and gain the best of both worlds? What new systems requirements do LLMs and ML techniques impose on spatial data systems? This talk addresses these questions, and highlights potential directions for the interplay of spatial data systems and LLMs to achieve a win-win scenario (1+1=3).
Bio: Walid G. Aref is a professor of computer science at Purdue. His research interests are in extending the functionality of database systems in support of emerging applications, e.g., spatial, spatio-temporal, graph, and sensor databases. He is also interested in query processing, indexing, data streaming, and geographic information systems (GIS). Walid's research has been supported by the National Science Foundation, the National Institute of Health, Purdue Research Foundation, CERIAS, Panasonic, and Microsoft Corp. In 2001, he received the CAREER Award from the National Science Foundation and in 2004, he received a Purdue University Faculty Scholar award. Walid is a member of Purdue's CERIAS. He has served as Editor-in-Chief of the ACM Transactions of Spatial Algorithms and Systems (ACM TSAS), an editorial board member of the Journal of Spatial Information Science (JOSIS), and an editor of the VLDB Journal and the ACM Transactions of Database Systems (ACM TODS). Walid has won several best paper awards including the 2016 VLDB ten-year best paper award. He is a Fellow of the IEEE, and a member of the ACM. Between 2011 and 2014, Walid has served as the chair of the ACM Special Interest Group on Spatial Information (SIGSPATIAL).
University of California
Abstract: Maps are powerful, but making sense of them has traditionally required specialized expertise in GIS software, complex query languages, and significant manual effort. Advances in large language models (LLMs) and generative AI are beginning to change this dynamic, opening new ways of working with spatial data that are far more intuitive. Instead of relying on specialized tools, users can now describe the data they need or write complex geographic questions, and intelligent systems can translate those intentions into concrete results. This talk will highlight recent progress in three key directions: generating realistic spatial datasets from textual descriptions, answering complex questions that combine spatial reasoning with external knowledge, and automatically creating styles that make map visualizations more effective and interpretable. Taken together, these advances illustrate a new paradigm where geospatial data can be explored and understood through a natural and accessible interface.
Bio: Ahmed Eldawy is an Associate Professor in Computer Science at the University of California Riverside. His research interests lie in the broad area of databases with a focus on big data management and spatial data processing. Ahmed led the research and development in many open source projects for big spatial data exploration and visualization including UCR-Star, an interactive repository for geospatial data with nearly four terabytes of publicly available data. He is a recipient of the highly prestigious NSF CAREER award, the 10-year Influential Paper Award in ICDE 2025, and the Best Demo award in SIGSPATIAL 2020. His work is supported by the National Science Foundation (NSF) and the US Department of Agriculture (USDA).
Nanyang Technological University
Abstract: The talk will cover the following research problems on geospatial entity representation: 1) Geospatial Entity Representation for point objects, trajectory, and regions and their applications, e.g., spatial keyword search, POI recommendation, speed inference, region population estimations, etc. 2) Foundation Models for Geospatial Applications and Efforts toward City Foundation Models. The first part primarily concentrates on learning representations to facilitate geospatial entity querying and analysis. The second part focuses on self-supervised learning approaches applied to geospatial entities, and several research attempts towards city foundation models.
Bio: Gao Cong is currently a Professor in the College of Computing and Data Science (CCDS) at Nanyang Technological University (NTU). He serves as the head of Division of Data Science, CCDS, NTU. He previously worked at Aalborg University, Denmark, Microsoft Research Asia, and the University of Edinburgh. His current research interests include AI4DB, spatial data management, spatial-temporal data mining, and recommendation systems. His citation in Google Scholar was over 20,000 with H-index 75. He received SIGIR'25 test of time award honourable mention award, and the best paper runner-up awards at the WSDM'20 and WSDM'22 conferences for two of his research papers. He served as a PC co-chair for ICDE'2022, the associate general chair of KDD'21, a PC co-chair for E&A track of VLDB 2014, and a PC vice-Chair for ICDE'18. He serves as an associate editor for ACM Transactions on Database Systems (TODS) and IEEE TKDE.
Prospective authors are invited to submit original research papers that address the topics of interest for the workshop. For authors submitting their papers (.pdf format), please format using the style file. We call for two types of papers:
Submission site: https://cmt3.research.microsoft.com/LLMSpatial2025
Accepted papers will be published in the VLDB Workshop Proceedings. At least one author of each accepted paper is expected to register for VLDB 2025 and present the paper in person.
We will enforce a rigorous peer and single-anonymous review process. All manuscripts submitted to our workshop will be reviewed by at least two PC members. Plagiarism Detection Tools will be used to check the content of the submitted manuscripts against previous publications. Papers will be evaluated according to the following aspects:
We will follow the conflict of interest policy for ACM publications.
Acknowledgement: The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.