Waleed Alzuhair, flickr

Big Geodata Newsletter, May 2025

Get support on choosing a study

Greetings from the Big Geodata Newsletter!

In this issue you will find information on Geospatial Reasoning by Google Research for integrating generative AI with geospatial foundation models, the Sedona STAC Reader for streamlining large-scale geospatial data analysis, the advantages of tensor-based models over tables for scientific big data, NVIDIA Earth-2’s approach to AI-driven flood risk assessment, and the LUIcube dataset tracking global land-use intensity from 1992 to 2020. Our Geospatial Computing Platform user story on localized deformation detection and prediction in urban environments using SAR and deep learning is also here!

Happy reading! 

You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.

Geospatial Reasoning: Combining Generative AI and Foundation Models for Spatial Insights


Image credits: Google Research Blog, 2025

Geospatial Reasoning, introduced by Google Research, brings together generative AI and multiple geospatial foundation models to advance the way spatial data is analyzed and interpreted. Instead of relying on a single model, the framework integrates Google's Gemini language models with domain-specific models like the Population Dynamics Foundation Model (PDFM) and a new Remote Sensing Foundation Model, enabling deeper exploration of complex geospatial relationships. This system uses a agent-based orchestration method, allowing users to pose natural language queries about geographic phenomena. The models then coordinate to retrieve, synthesize, and present meaningful answers and visualizations based on satellite imagery, aerial photos, and environmental datasets. By leveraging multimodal embeddings, the framework supports tasks across diverse fields such as disaster response, urban planning, and environmental monitoring. Geospatial Reasoning highlights how combining language and geospatial models can improve accessibility to large, distributed datasets, helping researchers and planners interpret spatial patterns with greater clarity and efficiency. 

Learn more about Geospatial Reasoning and its applications here. Explore Google's broader research on AI and geospatial technologies here

Sedona STAC Reader: Simplifying Geospatial Analytics with Apache Sedona


Image credits: Feng Zhang, LinkedIn, 2025

The latest release of Apache Sedona 1.7.1 introduces the STAC Reader, aimed at simplifying the ingestion and analysis of SpatioTemporal Asset Catalog (STAC) datasets within large-scale geospatial workflows. Developed to enhance cloud-native geospatial analytics, the STAC Reader provides an efficient way to access and query imagery and raster data from STAC-compliant endpoints without complex preprocessing steps. STAC datasets, such as Sentinel-2, Landsat, and other Earth observation products, contain rich metadata and references to large collections of assets. Traditionally, working with these datasets involved time-consuming tasks like manual downloading, parsing, and structuring of imagery. With the new STAC Reader, Sedona users can now directly load and analyze STAC datasets at scale using Apache Spark, leveraging Sedona’s distributed spatial computing capabilities. This new feature simplifies workflows and improves performance, scalability, and cost-efficiency by supporting direct, on-demand querying of extensive geospatial collections stored in cloud object storage.  

Learn more about the Sedona STAC Reader and its capabilities here. Explore Apache Sedona and its geospatial processing tools here

Tensors vs Tables: Rethinking Data Models for Scientific and Cloud-Native Big Data


Image Credits: Earthmover, 2025

Traditional tabular data models, structured into rows and columns, have served data analytics needs quite well for long time. However, their limitations become increasingly clear for complex, multidimensional scientific data. In the Earthmover blog "Tensors vs Tables," the case is made for shifting to tensor-based or multidimensional array-native models to meet the demands of modern data processing. Storing these datasets in flat tables often results in redundant information, cumbersome queries, and inefficient computation. By contrast, tensor structures preserve the natural organization of data, allowing more intuitive slicing, querying, and analysis. This shift is particularly important for big data and cloud-native architectures, where efficient data access and transfer are critical. Managing scientific big data in a tensor-native format reduces I/O costs, accelerates data retrieval, and enables parallel computing frameworks like Dask and Apache Spark to operate more effectively. Embedding tensors within traditional tables, even using advanced formats like Apache Arrow or Parquet, creates technical debt: retrieval becomes inefficient, metadata becomes harder to manage, and scaling across distributed systems becomes more complex. 

Read the full article with detailed performance benchmarks on different approaches here

Powering Flood Risk Assessment with NVIDIA Earth-2


Image credits: NVIDIA, 2025

NVIDIA Earth-2 is reshaping flood risk assessment by combining AI-driven weather modeling with scalable geospatial analysis. Traditional flood models depend on limited historical data, making predicting rare, catastrophic events like 1-in-200-year floods difficult. To improve preparedness, JBA Risk Management collaborated with NVIDIA to create the Huge Ensemble (HENS) pipeline, producing thousands of physically realistic weather scenarios. Built on Earth-2’s AI-accelerated software stack, the HENS pipeline generates synthetic weather datasets faster and at larger scales than conventional methods. Using just 110 GPU hours on NVIDIA L40s GPUs in the cloud, the team simulated 300 years of atmospheric data, which was then fed into hydrological models to estimate flood risk. This approach enables a detailed understanding of flood risk distributions, helping industries like insurance and urban planning better manage extreme weather impacts. By leveraging AI and cloud-native infrastructure, NVIDIA Earth-2 enables faster, more resilient flood risk modelling, supporting better decision-making and long-term climate adaptation strategies. 

Learn more about NVIDIA Earth-2 and its flood risk modeling capabilities here. Explore NVIDIA’s broader Earth-2 initiatives here.

Upcoming EVENTS

The "Big" Picture


Image credits: Matej et al., 2025

The newly released LUIcube dataset provides an extensive global land-use data cube, documenting annual human land-use intensity from 1992 to 2020 at a high resolution of 30 arcseconds (~1 km²). Developed using the Human Appropriation of Net Primary Production (HANPP) framework, this dataset quantifies global human impacts on terrestrial ecosystems. LUIcube classifies land into 32 detailed categories, aggregated into five major land-use types: cropland, grazing land, forestry, built-up land, and wilderness areas. It combines satellite-based land cover data from ESA's Climate Change Initiative with robust statistical data from the FAO and other authoritative sources, ensuring high accuracy and consistency. By capturing annual dynamics in biomass harvest, land conversion, and ecosystem productivity, LUIcube enables studying human-induced environmental change. The dataset supports researchers, policymakers, and conservationists in understanding long-term trends, drivers, and ecological impacts of global land-use changes, essential for shaping sustainable land management strategies. 

Explore the full LUIcube study and access the dataset here

Matej, S., Weidinger, F., Kaufmann, L., Roux, N., Gingrich, S., Haberl, H., Krausmann, F., and Erb, K. (2025) A global land-use data cube 1992–2020 based on the Human Appropriation of Net Primary Production. Scientific Data, 12(1). doi:10.1038/s41597-025-04788-1.

CRIB News
User Story: Localized deformation detection and prediction in urban environments

The urban environment is complex and, as an interdependent system, vulnerable to natural hazards, like floods, earthquakes, hurricanes, and deformations. Land deformation has become a serious urban problem in recent decades because of groundwater extraction and mining exploration. It causes extensive damage to infrastructure, such as to roads, bridges, and buildings and is a potential threat to human life.

Zhang et al. present a method for obtaining a wide range of InSAR results that allow one to detect and predict deformations, specifically in urban environments. The method uses the latest SAR image acquisitions and includes a Persistent Scatterer (PS) based concatenation component for wide-range deformation mapping, a deep learning component for deformation detection and prediction, and metrics for assessment of the results. Geospatial Computing Platform’s powerful computational capabilities, extensive data storage, and high-performance hardware enabled efficient processing of large SAR datasets.

Read more about their story here!