Waleed Alzuhair, flickr

Big Geodata Newsletter, June 2022

Become a high-skilled geospatial professional

Greetings from the Big Geodata Newsletter!

In this issue you will find information on OCRE research funding for Earth Observation services, Copernicus Jupyter Notebook Competition, cuNumeric - a GPU-enabled drop-in replacement for NumPy at scale, xcube - an xarray-based EO data cube toolkit, and a method of deploying user-defined EO algorithms for large scale data analysis on the cloud by using Data Cube Resilient Distributed Datasets (DRDDs). Our regular upcoming events and recent releases are here as well.

Happy reading! 

You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.

OCRE research funding for Earth Observation services


Image credits: OCRE, 2022

The EC-funded Open Clouds for Research Environments (OCRE) project opened the final funding call to distribute €6.5 million to research projects for use of Cloud and Earth Observation services. Application procedure differs depending on the type of service requested, and in case of EO, OCRE will provide funded services from its catalogue of EO suppliers to the value of €200,000 (minimum ask is €100,000) to projects awarded, based on their relevance and ability to demonstrate the impact of these services on research activities and outcomes. The EO catalogue currently includes 32 service providers providing services such as data analytics, EO data processing, interactive algorithm development, user algorithm hosting, and value added products. You can submit your proposal until 10 July by using a guided application form. For more information on the OCRE project and current adoption funding opportunities your can read their flyer or watch their recent webinar.

Copernicus Jupyter Notebook Competition


Image credits: WEkEO, 2022

As a subscriber of the newsletter, you are probably familiar with tools like Jupyter Notebooks, which are web based interactive documents that allow for easy interaction with data and visualize the results. (Big) EO data is not an exception. In order to help to stimulate new users and to drive innovation with Copernicus data and information, WEkEO is currently running a competition called the Copernicus Jupyter Notebook Competition. Participants can choose one of the four available tracks, coupled with land, marine, climate, or air quality thematic data. The submissions will be evaluated by a panel of independent judges, and the winning teams will be awarded cash prizes. The ultimate goal of the Competition is to build a community-driven resource of notebooks on the Copernicus. For more information you can check https://notebook.wekeo.eu/

Besides allowing you to discover the vast range of Copernicus data, the competition can help you to advance your interactive computing skills and also showcase your expertise to a wider community. Don't miss this opportunity!

cuNumeric: a GPU-enabled drop-in replacement for NumPy at scale


Image credits: NVIDIA Legate, 2022

NumPy is the de facto standard Python math and matrix library for scientific applications, which provides a simple and easy-to-use programming model. It sets a foundation for many of the most widely used data science and machine learning frameworks, especially in the geospatial and EO domains. cuNumeric is a library that aims to provide a distributed and GPU-accelerated drop-in replacement for the NumPy API, so that programs that have very large arrays of data that cannot fit in the memory of a single GPU or a single node can be span multiple nodes and GPUs easily - without changing the program code! Benchmarks show that good weak scaling with little drop in throughput is achievable while scaling up to 2048 A100 GPUs. The library is currently a work in progress and support for additional NumPy operators are added gradually. A complete list of available features is provided in the API reference.

cuNumeric is part of the Legate Project, which aims that any programmer can run code on any scale machine without needing to be an expert in parallel programming and distributed systems. Check their documentation for more details!

Upcoming Meetings

xcube: an xarray-based EO data cube toolkit


Source: xcube, 2022

xcube is an open-source Python package that can be used to convert Earth Observation and other geographical data into data cubes that can then be published. xcube is built upon a big data ecosystem that consists of popular Python packages like xarray, dask, and zarr. Datasets from popular providers like Sentinel Hub or ESA’s CCI Open Data Portal can be used for cube generation through APIs or other plugins. Once cubes have been generated from an external source and the xcube dataset has then been optimized, researchers can access, analyze, transform, and visualize the data for specific use cases. The package also supports extracting data points or resampling the input data with respect to time to generate temporal aggregations. To facilitate data exploration, xcube provides a lightweight viewer app that runs as a single webpage and allows users to visualize their data cubes. You can watch the video showcasing xcube's features to learn more about its capabilities, which are under active development. A detailed documentation, including user and developer guides, as well as xcube Dataset Specification, is also available.

Recent Releases

The "Big" Picture


Image credits: Xu et al., 2022

The popularity of cloud-based remote sensing platforms are on the rise for big geodata analysis. However, one drawback of such platforms is the support for user-defined algorithms. If required functions are not pre-implemented by the platform providers, it can be hard to implement custom algorithms, especially if they require specific libraries. One solution to this problem is to use containerization. Xu et al. propose a method of deploying user-defined remote sensing algorithms for large scale data analysis on the cloud. The EO datasets are first organized into homogeneous and analysis-ready Data Cube Resilient Distributed Datasets (DRDDs). Then composite containers are utilized that make use of Docker containers to run user-defined algorithm and task runners to transform the parameters and data cubes needed for the execution of the algorithm. Experiments carried out with 10-m resolution Sentinel 2 and using Support Vector Machine and U-Net based Deep Learning for continental-scale land cover mapping on 3 different platforms show that using the proposed approach gave better results than both Microsoft Planetary Computer and Google Earth Engine in terms of the number of pixels processed and the computation efficiency. The authors conclude that the proposed approach can help researchers quickly port legacy algorithms for EO to the cloud without rewriting them.

Xu, C., Du, X., Jian, H., Dong, Y., Qin, W., Mu, H., Yan, Z., Zhu, J. and Fan, X. (2022) Analyzing large-scale Data Cubes with used-defined algorithms: a cloud-native approach, Int. Journal of Applied Earth Observation and Geoinformation, 109:102784, doi:10.1016/j.jag.2022.102784

CRIB News
dr.ing. S. Girgin MSc (Serkan)
Senior Researcher, Head of CRIB

Big geospatial data ecosystem components, especially machine learning and AI tools and frameworks, are evolving rapidly. We try to highlight some of them in each newsletter to keep you informed about the recent developments. We also keep our Geospatial Computing Platform up-to-date by providing the latest versions of the software packages and libraries by following a rolling update policy. It takes a lot of time and effort, but it is motivating to see that the result is useful to many users - more than 750 now. A nice recognition of the devoted work was my nomination for the SURF Research Support Champion 2022 award. Thanks to the support of many platform users I received the award in the Universities category. I'm grateful for your kind support, heartfelt thanks!

For any comments or suggestions about the newsletter, or if you want to contribute, please simply send us an e-mail.

If you find it useful, please share this link with your colleagues so that they can also subscribe. Thanks!