Greetings from the Big Geodata Newsletter!
In this issue you will find information on the EOPF 101 Toolkit for working with Sentinel data in the Cloud, NASA's AppEEARS QGIS Plugin for streamlined geospatial data integration, Element 84's insights on chunking strategies in cloud-native array storage, Development Seed's STAC Semantic Search tool for natural language geospatial queries, and DeepMind's new AI-powered global land cover mapping model AlphaEarth Foundations. This edition also features a user story from our Geospatial Computing Platform on mapping individual oak trees across Dutch urban landscapes.
Happy reading!
You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.
EOPF 101 Toolkit: Working with Sentinel Data in the Cloud

Image credits: ESA EOPF Zarr
EOPF 101 is a hands-on, open-source guide designed to help users work with Copernicus Sentinel data using cloud-native tools and Zarr-based products. Developed as part of the ESA-funded EOPF Toolkit and collaboratively built by Development Seed, thriveGEO, and SparkGeo, this modular resource makes working with Sentinel 1, 2, and 3 data more accessible via scalable, cloud-optimized workflows. With a structure built around five interactive chapters, EOPF 101 covers essential topics: Sentinel orientation, Zarr format fundamentals, STAC-based data discovery via the EOPF API, and applied use-case workflows such as wildfire mapping and coastal change analysis. Each chapter includes ready-to-run notebooks and real-world examples. Supporting technologies within the EOPF ecosystem include StackSTAC, Titiler-multidimensional, GDAL Zarr driver, and language bindings like Rarr and Julia Zarr IO. By leveraging Zarr's chunked, N‑dimensional array format, EOPF 101 enables partial data access, streaming retrieval, and seamless performance on cloud object storage platforms. This makes Sentinel data more available for scalable analysis workflows, even to users new to cloud-native geospatial methods.
Explore EOPF 101 and begin working with Sentinel data in the cloud here.
AppEEARS QGIS Plugin: Streamlining Subset Data Loading into QGIS

Image credits: NASA Earthdata Blog
NASA's AppEEARS QGIS Plugin, launched in June 2025, integrates the AppEEARS data extraction system into QGIS, simplifying how users access Earth science data, including MODIS, SMAP, HLS, and ECOSTRESS, from multiple archives. The plugin allows browsing AppEEARS sample requests and loading resulting cloud-optimized GeoTIFFs (COGs) directly into QGIS without manual downloads, saving time and reducing local storage needs. AppEEARS itself enables spatial, temporal, and variable-based subsetting of large datasets, producing compact COG outputs through efficient point or area sample requests. This workflow dramatically reduces data transfer sizes from terabytes to manageable CSVs or mosaicked rasters, while preserving quality and format flexibility. Plugin users can rapidly explore, visualize, and further analyze Earth observation data with minimal effort.
Explore the AppEEARS QGIS Plugin and add it to your workflow here.
Chunks and Chunkability: Understanding the Structure Behind Array Storage

Image credits: Element 84
A two-part series from Element 84, Chunks and Chunkability: Tyranny of the Chunk and An Origin Story, offers an in-depth exploration of the evolution of array chunking in cloud-native geospatial contexts. The posts uncover how data storage formats, from Landsat CCT to modern Zarr, evolved around chunked layouts to balance data access, compression, and cloud efficiency. The first post, Tyranny of the Chunk, challenges the assumption that chunking is purely technical necessity, arguing that it has become a governing force in how users and producers structure data. Chunk size, shape, and ordering directly impact performance, compression, and storage, forcing both creators and consumers to negotiate the constraints of chunked layouts. The second post, An Origin Story, traces how chunking practices developed historically from early raster formats like Landsat MSS CCT and FITS to modern standards like TIFF, HDF5, and Zarr. It highlights key transitions where file abstractions dissolved, leaving users to grapple with chunking details that were once hidden. Together, these articles reveal how and why chunking became central to data design and suggest paths forward: redefining cloud-native storage semantics or applying smarter compression and data representations to mitigate chunk constraints.
Explore both parts of Element 84's deep dive into chunking history and implications here.
STAC Semantic Search: Natural Language Queries for Geospatial Data Discovery

Image credits: Development Seed
STAC Semantic Search, developed by Development Seed, enables users to query STAC catalogs using natural language prompts instead of writing complex API filters. The tool supports queries like “Find cloudless Sentinel‑2 imagery over Paris from 2023,” and intelligently interprets spatial, temporal, and collection-specific elements. Features include vector-based collection ranking, AI-powered agents for parsing query intent, and a Streamlit-based visual interface for interactive exploration. Built to work with any STAC‑compliant catalog, such as Microsoft Planetary Computer, the system embeds collection metadata using sentence transformers, stores embeddings in ChromaDB, and uses LLMs to refine results. The query pipeline includes separate agents for temporal interpretation, spatial bounding box resolution, collection identification, and STAC API filter formulation, supporting a fully AI-driven search experience. While still early in development, STAC Semantic Search represents a shift toward more accessible, intuitive search tools for geospatial data.
Try the STAC Semantic Search demo or explore the full toolchain on GitHub here. Learn more about STAC and its role in geospatial metadata discovery here.
Upcoming EVENTS
- CRIB Training: Introduction to Numpy
ITC, Enschede, 13 August 2025 - Big Data from Space 2025
Riga, Latvia, 29 September - 3 October 2025 - SURF Network and Cloud Event,
Hilversum, 30 September 2025 - National Open Science Festival 2025
Groningen, 24 October 2025 - Intermediate Research Software Development with Python
eScience Center, Amsterdam, 28 October - 2 December 2025 - Research Software Support Training
eScience Center, Amsterdam, 5 - 19 November 2025
The "Big" Picture

Image credits: Google Deepmind
AlphaEarth Foundations, developed by Google DeepMind, is an AI model that acts like a virtual satellite, converting massive volumes of Earth observation data into compact, continuous global map representations. This model integrates satellite imagery, radar, LiDAR, climate simulations, and field measurements across time and space into 64-dimensional embedding fields, each summarizing a 10 x 10 m area with remarkable detail and efficiency. Instead of relying on dense labeled data, AlphaEarth is trained using sparse labels, yet it consistently outperforms all previous featurization techniques across multiple mapping benchmarks without retraining. The embedding outputs capture spatial, temporal, and measurement contexts, enabling annual global maps from 2017 to 2024. These datasets are available through Google Earth Engine as analysis-ready satellite embedding layers. Compared to earlier approaches, AlphaEarth achieves lower error rates and approximately 24% improvement in accuracy, while requiring 16 times less storage, greatly enhancing scalability for environmental monitoring and geospatial analytics. The embedding vectors are globally consistent, enabling downstream uses such as clustering, classification, and change detection with minimal preprocessing. This technology offers new opportunities to track climate impacts, deforestation, water resources, and land-use changes more quickly and cost-effectively.
Explore AlphaEarth Foundations and access its satellite embedding dataset here. Read the technical preprint on arXiv here.
Brown, C. F., Kazmierski, M. R., Pasquarella, V. J., Rucklidge, W. J., Samsikova, M., Zhang, C., Shelhamer, E., Lahera, E., Wiles, O., Ilyushchenko, S., Gorelick, N., Zhang, L. L., Alj, S., Schechter, E., Askay, S., Guinan, O., Moore, R., Boukouvalas, A., & Kohli, P. (2025). AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data (arXiv:2507.22291). arXiv. https://arxiv.org/abs/2507.22291
