Waleed Alzuhair, flickr

Big Geodata Newsletter, July 2025

Get support on choosing a study

Greetings from the Big Geodata Newsletter!

In this issue you will find information on NASA's TiTiler-CMR, enabling on-demand visualization of EO data; TerraMesh, a multimodal EO dataset for training large-scale foundation models; comparisons between Zarr and Cloud-Optimized GeoTIFFs for cloud-native geospatial workflows; OSMlanduse, a high-resolution land-use dataset of the EU combining OpenStreetMap and Sentinel-2 data; and how citizen science combined with MODIS satellite data is revealing large-scale patterns in bird population trends. This edition also features a user story from our Geospatial Computing Platform on the SAR deep learning for tracking oil palm plantation expansion along rivers.

Happy reading! 

You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.

Visualizing Earth Data on Demand: NASA's TiTiler‑CMR


Image credits: NASA Earthdata Blog

TiTiler‑CMR is an open‑source, web‑based tool developed by NASA IMPACT and Development Seed to deliver on‑demand EO imagery without relying on pre‑rendered tiles. By connecting directly to NASA’s Common Metadata Repository (CMR) and supporting NetCDF, HDF5, and COG formats via xarray and earthaccess, TiTiler‑CMR generates exactly the tiles users need when they need them eliminating unnecessary storage and maintenance efforts. The tool is integrated into the VEDA platform, currently serving data like IMERG precipitation and HLS imagery, with future plans to support time‑series plotting and user‑defined band combinations. This "made-to-order" method improves flexibility and speeds up access, making it easier for developers, researchers, and the public to explore NASA’s growing Earth science data archive. While still evolving, especially around scalability, performance, and documentation, TiTiler‑CMR represents a significant step in modernizing satellite data visualization and aligns with cloud‑native principles by bringing fresh data directly to users, without delay. 

Learn more about TiTiler‑CMR and its live demo here. Explore the TiTiler‑CMR GitHub repository and source code here.

TerraMesh: A Planetary Mosaic of Multimodal Earth Observation Data


Image credits: TerraMesh

TerraMesh is a newly introduced global dataset designed for foundational pre-training in EO machine learning. It merges eight spatiotemporally aligned modalities, including optical imagery, SAR, elevation models, land-cover, and NDVI, into more than 9 million co-registered samples. This multimodal design enables effective learning of cross-modal relationships in large-scale foundation models. Constructed in an Analysis-Ready Data (ARD) format, TerraMesh carefully processes, subsamples, and stores data in compressed Zarr archives to ensure quality and efficiency. The dataset spans multiple seasons and covers diverse global regions, enhancing its suitability for pre-training tasks. Empirical experiments show that models pre-trained on TerraMesh, such as TerraMind, demonstrate improved performance in downstream tasks, including segmentation benchmarks, compared to alternatives trained on single-modality or smaller multimodal datasets. 

Explore TerraMesh and access the dataset here. Learn about TerraMind, the generative multimodal model trained on TerraMesh here.

Understanding the Roles of Zarr and COG in Cloud-Native Geospatial Workflows


Image credits: Element 84

In a recent discussion by Element 84, the question "Is Zarr the new COG?" is explored, concluding that the two formats serve complementary, not competing, purposes. Cloud-Optimized GeoTIFFs (COGs) are tailored for 2D raster data, offering efficient streaming via HTTP range requests and built-in overview levels, making them ideal for maps and single-time imagery. In contrast, Zarr handles N-dimensional data cubes, such as multitemporal satellite imagery or climate model outputs, by chunking across time and space for scalable parallel access. The release of Zarr v3 introduces sharding, which bundles multiple chunks into single files, reducing file count and improving performance, making Zarr behave more like a COG while preserving its multidimensional strengths. Additionally, virtual Zarr via tools like Kerchunk allows accessing legacy NetCDF/HDF files as Zarr stores without data duplication. In practice, catalogs often use STAC + COG for raster discovery and STAC + Zarr for multidimensional datasets, enabling both discovery and efficient data access.

Learn more about how Zarr and COG complement each other in cloud-native geospatial systems here. Explore the evolving geospatial data formats guide here.

OSMlanduse: High-Resolution EU Land Use Map from OpenStreetMap and Sentinel‑2


Image credits: Schultz et al., 2025

OSMlanduse delivers the first continent‑scale 10 m land use map of the EU, blending volunteer‑contributed OpenStreetMap (OSM) labels with Sentinel‑2 imagery via country‑specific deep learning models. This approach assigns OSM tags to 13 CORINE classes, covering 61.8 % of EU territory as of March 2020, then trains per-country ResNet classifiers to predict the remaining areas. The result is a globally applicable land-use model, validated using 4,616 reference points, achieving 89% overall accuracy, with class accuracies between 77-99%. The team processed combined cloud‑filtered Sentinel‑2 RGB+NIR composites with OSM labels, using a Residual CNN per country to handle regional variability. The final product includes 28 GeoTIFFs, openly licensed (ODbL), and visualized via osmlanduse.org , supporting use cases ranging from environmental monitoring to urban planning. 

Explore the OSMlanduse map and data download here. Access the full paper and datasets here

Upcoming EVENTS

The "Big" Picture


Image credits: NASA Earthdata

A joint analysis using citizen science observations from eBird and satellite-derived MODIS environmental data shows how large‑scale Big Data integration can uncover complex patterns in bird populations. Researchers combined millions of user-submitted sightings with daily measurements of vegetation index, land surface temperature, and phenological data to model population trends for multiple bird species across North America. By leveraging high-resolution remote sensing indices, such as NDVI and land surface temperature, the study reveals how species' migration and breeding patterns closely relate to shifting vegetation cycles and climate variability. Their spatiotemporal models, built on vast datasets, highlight that bird distribution changes accelerate in regions experiencing rapid seasonal shifts. Importantly, this work demonstrates the power of aggregating distributed, high-volume data for ecological monitoring at continental scales. The approach exemplifies cloud-native analytics by efficiently processing terabytes of raster and observation data, integrating tools from EDA platforms, and delivering actionable insights on wildlife conservation and habitat management. This fusion of citizen science and satellite data makes pattern detection more robust and timelier across vast geographic areas. 

Explore the full study and its methods here. Explore the dataset on the Zenodo repository here

Johnston, A., Rodewald, A. D., Strimas-Mackey, M., Auer, T., Hochachka, W. M., Stillman, A. N., Davis, C. L., Ruiz-Gutierrez, V., Dokter, A. M., Miller, E. T., Robinson, O., Ligocki, S., Jaromczyk, L. O., Crowley, C., Wood, C. L. and  Fink, D. (2025) North American bird declines are greatest where species are most abundant. Science, 388(6746), 532-537. doi:10.1126/science.adn4381

CRIB News
Big Geodata User Story: SAR image deep learning enables tracking oil palm plantation expansion along rivers

onitoring oil palm plantations, especially in sensitive riverine environments, is essential for sustainable land management and ecological protection. However, traditional optical remote sensing faces challenges due to persistent cloud cover in tropical regions. This limitation prompted Mohammad Afif Fauzan, then MSc student at ITC, to leverage SAR imagery and deep learning techniques to reliably track plantation expansion. 

Along with Dr. Iris van Duren and Dr. Raian Vargas Maretto, Afif developed an advanced deep-learning model capable of accurately classifying land-use types. Utilizing Sentinel-1 SAR data, the model achieved enhanced accuracy by effectively distinguishing oil palm plantations from surrounding land covers. This innovative approach ensures continuous and reliable monitoring. The workflow was executed on the Geospatial Computing Platform, enabling scalable processing and robust handling of large SAR datasets.