1st Large Language Models for Spatial-rich Data Management (LLM+Spatial)

In conjunction with the 51st International Conference on Very Large Data Bases (VLDB)

London, United Kingdom - September 5, afternoon, 2025

Workshop Overview

The importance of spatio-temporal data has increased significantly in various scientific fields, such as climate research, biodiversity, and the social sciences, primarily due to improvements in data collection and accessibility. Despite the opportunities for new scientific insight, researchers often face the challenge of inadequate tools and interfaces for managing, integrating, and analyzing spatio-temporal data. Recently, the emergent abilities of LLMs represent a pivotal point that is to significantly affect the academic and industrial communities. The vast amount of knowledge in spatial-rich data is not used to train and tune LLMs, and, spatio-temporal databases are not able to access and operate on the facts contained in the LLMs. This workshop aims to provide new insight into techniques from spatial-rich data and large language models to improve advances in spatial-rich data management and predictive models.

LLM+Spatial

Organization

Workshop Co-Chairs

Jianqiu Xu

Nanjing University of Aeronautics and Astronatucis, P.R.China

jianqiu@nuaa.edu.cn

Cheng Long

Nanyang Technological University, Singapore

c.long@ntu.edu.sg

Bernhard Seeger

University of Marburg, Germany

seeger@mathematik.uni-marburg.de

Yongxin Tong

Beihang University, P.R.China

yxtong@buaa.edu.cn

Call for Papers

The goal is to advance the understanding of how LLMs and spatial-rich data management can cooperatively contribute to novel data science solutions. Topics of interest include, but are not limited to:

  • Spatial-rich Data Foundation Model
  • Enhance LLMs by Spatial-rich Data
  • Spatial-rich Data Quality, Anomaly Detection, and Imputation with LLMs
  • Retrieval-augmented Models for Geospatial Applications
  • NL2SQL for Spatio-temporal Data
  • Spatial and Temporal-spatial Contextual Reasoning with LLMs
  • Embedding Learning for Geospatial Data with LLMs
  • Fine-tuning LLMs on Domain-specific Geospatial Data
  • Benchmarking of LLMs + Spatio-temporal Databases
  • Optimizing Spatio-temporal Databases with LLMs
  • Cases Studies and Applications of LLMs + Spatial-rich Data
  • Visions for LLMs + Spatio-temporal Databases

Program

September 5, 13:30-17:10, St James Room, 4th Floor

Time Topic Speakers
13:30-13:40 Brief Introduction and Photo Jianqiu Xu
13:40-14:40 Keynote 1: Spatial Data Systems in the LLM Era: 1+1=3? System Requirements and Research Opportunities Walid G. Aref
14:40-15:00 Paper 1: NALMOBench: Towards Benchmarking Natural Language Interfaces for Moving Objects Databases Xieyang Wang
15:00-15:30 coffee break
15:30-16:20 Keynote 2: Natural Language Maps: Generative AI for Spatial Data Generation, Querying, and Visualization Ahmed Eldawy
16:20-17:10 Keynote 3: Geospatial Entity Representation: A Step Towards City Foundation Models Gao Cong

Keynotes

Keynote 1

Spatial Data Systems in the LLM Era: 1+1=3? System Requirements and Research Opportunities


Walid G. Aref

Purdue University

Abstract: Large Language Models are replacing many components in data systems. Spatial data systems are no exception. Important questions arise: Do LLMs replace spatial data systems in their entireties or do we still need spatial data systems? How do we adapt spatial data systems to leverage LLMs and gain the best of both worlds? What new systems requirements do LLMs and ML techniques impose on spatial data systems? This talk addresses these questions, and highlights potential directions for the interplay of spatial data systems and LLMs to achieve a win-win scenario (1+1=3).

Bio: Walid G. Aref is a professor of computer science at Purdue. His research interests are in extending the functionality of database systems in support of emerging applications, e.g., spatial, spatio-temporal, graph, and sensor databases. He is also interested in query processing, indexing, data streaming, and geographic information systems (GIS). Walid's research has been supported by the National Science Foundation, the National Institute of Health, Purdue Research Foundation, CERIAS, Panasonic, and Microsoft Corp. In 2001, he received the CAREER Award from the National Science Foundation and in 2004, he received a Purdue University Faculty Scholar award. Walid is a member of Purdue's CERIAS. He has served as Editor-in-Chief of the ACM Transactions of Spatial Algorithms and Systems (ACM TSAS), an editorial board member of the Journal of Spatial Information Science (JOSIS), and an editor of the VLDB Journal and the ACM Transactions of Database Systems (ACM TODS). Walid has won several best paper awards including the 2016 VLDB ten-year best paper award. He is a Fellow of the IEEE, and a member of the ACM. Between 2011 and 2014, Walid has served as the chair of the ACM Special Interest Group on Spatial Information (SIGSPATIAL).

Keynote 2

Natural Language Maps: Generative AI for Spatial Data Generation, Querying, and Visualization


Ahmed Eldawy

University of California

Abstract: Maps are powerful, but making sense of them has traditionally required specialized expertise in GIS software, complex query languages, and significant manual effort. Advances in large language models (LLMs) and generative AI are beginning to change this dynamic, opening new ways of working with spatial data that are far more intuitive. Instead of relying on specialized tools, users can now describe the data they need or write complex geographic questions, and intelligent systems can translate those intentions into concrete results. This talk will highlight recent progress in three key directions: generating realistic spatial datasets from textual descriptions, answering complex questions that combine spatial reasoning with external knowledge, and automatically creating styles that make map visualizations more effective and interpretable. Taken together, these advances illustrate a new paradigm where geospatial data can be explored and understood through a natural and accessible interface.

Bio: Ahmed Eldawy is an Associate Professor in Computer Science at the University of California Riverside. His research interests lie in the broad area of databases with a focus on big data management and spatial data processing. Ahmed led the research and development in many open source projects for big spatial data exploration and visualization including UCR-Star, an interactive repository for geospatial data with nearly four terabytes of publicly available data. He is a recipient of the highly prestigious NSF CAREER award, the 10-year Influential Paper Award in ICDE 2025, and the Best Demo award in SIGSPATIAL 2020. His work is supported by the National Science Foundation (NSF) and the US Department of Agriculture (USDA).

Keynote 3

Geospatial Entity Representation: A Step Towards City Foundation Models


Gao Cong

Nanyang Technological University

Abstract: The talk will cover the following research problems on geospatial entity representation: 1) Geospatial Entity Representation for point objects, trajectory, and regions and their applications, e.g., spatial keyword search, POI recommendation, speed inference, region population estimations, etc. 2) Foundation Models for Geospatial Applications and Efforts toward City Foundation Models. The first part primarily concentrates on learning representations to facilitate geospatial entity querying and analysis. The second part focuses on self-supervised learning approaches applied to geospatial entities, and several research attempts towards city foundation models.

Bio: Gao Cong is currently a Professor in the College of Computing and Data Science (CCDS) at Nanyang Technological University (NTU). He serves as the head of Division of Data Science, CCDS, NTU. He previously worked at Aalborg University, Denmark, Microsoft Research Asia, and the University of Edinburgh. His current research interests include AI4DB, spatial data management, spatial-temporal data mining, and recommendation systems. His citation in Google Scholar was over 20,000 with H-index 75. He received SIGIR'25 test of time award honourable mention award, and the best paper runner-up awards at the WSDM'20 and WSDM'22 conferences for two of his research papers. He served as a PC co-chair for ICDE'2022, the associate general chair of KDD'21, a PC co-chair for E&A track of VLDB 2014, and a PC vice-Chair for ICDE'18. He serves as an associate editor for ACM Transactions on Database Systems (TODS) and IEEE TKDE.

Program Committee

  • Sheng Wang, Wuhan University, P.R.China
  • Mahmoud Sakr, Free University of Brussels, Belgium
  • Ziqiang Yu, Yantai University, P.R.China
  • Man Lung Yiu, Hong Kong Polytechnic University, Hong Kong, P.R.China
  • Zheng Wang, Huawei Singapore Research Center, Singapore
  • Andreas Züfle, Emory University, U.S.A
  • Chenxi Liu, Nanyang Technological University, Singapore
  • Qianxiong Xu, Nanyang Technological University, Singapore
  • Liang Zhang, HEC Paris, Singapore
  • Ziquan Fang, Zhejiang University, P.R.China
  • Yuren Mao, Zhejiang University, P.R.China
  • Matthias Renz, University of Kiel, Germany
  • Amr Magdy, University of California Riverside, U.S.A
  • Tianyi Li, Aalborg University, Denmark
  • Ahmed Eldawy, University of California Riverside, U.S.A

Submission Instructions

Prospective authors are invited to submit original research papers that address the topics of interest for the workshop. For authors submitting their papers (.pdf format), please format using the style file. We call for two types of papers:

  1. Vision Papers (up to 4 pages, plus additional pages for the reference pages)
  2. Regular Research Papers (from 6 to 8 pages, plus additional pages for the reference pages)

Submission site: https://cmt3.research.microsoft.com/LLMSpatial2025

Accepted papers will be published in the VLDB Workshop Proceedings. At least one author of each accepted paper is expected to register for VLDB 2025 and present the paper in person.

Review Process

We will enforce a rigorous peer and single-anonymous review process. All manuscripts submitted to our workshop will be reviewed by at least two PC members. Plagiarism Detection Tools will be used to check the content of the submitted manuscripts against previous publications. Papers will be evaluated according to the following aspects:

  1. relevance to the workshop topic
  2. scientific novelty
  3. technical soundness
  4. appropriateness and adequacy in terms of literature review and analysis
  5. presentation

Handling of Conflicts of Interest

We will follow the conflict of interest policy for ACM publications.

Important Dates (AoE)

  • Submission deadline: May 15, 2025 May 25, 2025
  • Notification of acceptance: June 15, 2025
  • Camera-ready Version of Accepted Papers: July 1, 2025

Acknowledgement: The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.