Waleed Alzuhair, flickr

Big Geodata Newsletter, April 2022

Become a high-skilled geospatial professional

Greetings from the Big Geodata Newsletter!

In this issue you will find information on Pyjion - JIT compiler for Python, MAAP - NASA/ESA Multi-Mission Algorithm and Analysis Platform , Radiant MLHub - Open Library for EO Machine Learning, TorchGeo - Deep learning datasets, transforms, samplers, and pre-trained models for geospatial data, and GISD30 - Global 30m impervious-surface dynamic dataset. Our regular upcoming events and recent releases are here as well.

Happy reading! 

You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.

Pyjion: JIT Compiler for Python

Image credits: Pyjion, 2022

Python plays an important role in big data analysis. However, native Python code is notoriously slow. Pyjion is a Just-In-Time (JIT) compiler for Python, which compiles code to native C intermediary language and executes it using the .NET Common Language Runtime. The main advantage of using this over other runtimes is the fact that Pyjion can execute all Python code faster without any code changes. Pyjion can be installed easily using the Python package manager and once installed can be imported into a Python 3.10 environment. Benchmarks show that Pyjion is about 2 to 3 times faster than the regular Python in real world usage. A detailed list of available optimizations can be found in the Pyjion documentation. You can also try out Pyjion in a live environment and check out the official website here.

Multi-Mission Algorithm and Analysis Platform (MAAP)

Image credits: ESA, 2021

NASA and ESA released a new open-science tool that provides seamless access to above ground biomass information from both NASA and ESA Earth observation data. The tool called Multi-Mission Algorithm and Analysis Platform (MAAP) is the result of a 2-year cooperation effort and brings together relevant data, algorithms, and computing capabilities into a common cloud environment. This brings greater opportunities for researchers to collaborate on developing algorithms as well as analyze and visualize large datasets acquired from various sources.The tool currently includes data from NASA and ESA missions such as African Synthetic Aperture Radar (AfriSAR) and Global Ecosystem Dynamics Investigation (GEDI), and more will be supported soon such as NASA/Indian Space Research Organization SAR (NISAR) and ESA BIOMASS. Studying the above ground biomass is an important area in climate change research as it allows researchers to calculate how much carbon is stored and how loss of biomass can affect this. MAAP can also be adapted for collaborative exploration of science data in other disciplines. MAAP products can be explored on the MAAP Dashboard or the joint platform entrance. MAAP also can be accessed through individual NASA and ESA  landing pages.

Open Library for EO Machine Learning

Image credits: Radiant, 2022

Radiant MLHub is a cloud-based open library dedicated to Earth observation training data for use with machine learning algorithms. It hosts datasets and models generated by the Radiant Earth Foundation, partners, and community. Anyone can register to access, store, and share open training datasets or models for high quality Earth observations. Datasets are available for a wide variety of applications like building footprints, land cover, crops, wildfire, flood, and tropical storms. Examples of datasets accessible on the platform are Open Cities AI Challenge Dataset which includes drone imagery from 10 different cities and regions across Africa and SEN12 FLOOD which is a co-registered optical and SAR images time series for the detection of flood events. All available geospatial training data collections are stored using SpatioTemporal Asset Catalog (STAC) compliant catalogs. A Python client is available that allows users to easily interact with the datasets on the platform for which a quick start guide can be found here.

Upcoming Meetings

TorchGeo: Deep learning with geospatial data

Image credits: Microsoft, 2022

TorchGeo is a Python package for integrating geospatial data into the PyTorch deep learning ecosystem, making it easy for machine learning and remote sensing experts to use geospatial data in their workflows. TorchGeo provides data loaders for a variety of benchmark datasets, composable datasets for generic geospatial data sources, samplers for geospatial data, and transforms that work with multispectral imagery. Examples include the Canadian Building Footprints dataset containing about 12M computer generated building footprints, and various ML models including ChangeStar, Fully Convolutional Networks (FCN), and Residual Network (ResNet). The library can also be used to download datasets from Radiant MLHub (see above) and work on them. Using TorchGeo is easy if you are already familiar with PyTorch and a quick start guide demonstrating the various features of the library can be found here.

Recent Releases

The "Big" Picture

Image credits: Zhang et al., 2022

A global 30 m impervious-surface dynamic dataset (GISD30) for 1985-2020 was produced by Zhang et al. using time series Landsat imagery on the Google Earth Engine platform. First, multitemporal compositing methods and relative radiometric normalization were applied on previously available 30 m land-cover products from which global training samples and corresponding reflectance spectra were automatically derived. Next, pretrained spatiotemporal adaptive classification models were applied to map the impervious surface in each period. Researchers stated that their model achieved an overall accuracy of 91% and a kappa coefficient of 0.866 using 18,540 global time-series validation samples. Comparing this model to similar 30 m impervious surface models, it was found that this produced the best performance with respect to spatial distributions and spatiotemporal dynamics. The latest model suggests that the global impervious surface has doubled in the last 35 years with Asia seeing the largest increase. The open-access dataset is available at this link.

Zhang, X., Liu, L., Zhao, T., Gao, Y., Chen, X., and Mi, J. (2022) GISD30: global 30 m impervious-surface dynamic dataset from 1985 to 2020 using time-series Landsat imagery on the Google Earth Engine platform, Earth Syst. Sci. Data, 14, 1831–1856, doi:10.5194/essd-14-1831-2022

Rushvanth Bhaskar
Student Assistant

My name is Rushvanth Bhaskar and I am a second year Master's Student at the University of Twente pursuing Computer Science with a focus on Cybersecurity. I have been working as a student assistant to provide support to CRIB for the past few months and have worked on a number of projects including performing a security assessment of the platform, collating training resources for Geospatial Computing/Earth Observation among other things. I am excited to be a part of this community and look forward to supporting the platform and bringing you the latest updates in big geospatial data and geocomputing.

For any comments or suggestions about the newsletter, or if you want to contribute, please simply send us an e-mail.

If you find it useful, please share this link with your colleagues so that they can also subscribe. Thanks!