Waleed Alzuhair, flickr

Big Geodata Newsletter, August 2020

Become a high-skilled geospatial professional

Greetings from the Big Geodata Newsletter!

After a short summer holiday we are back with recent news on big geodata technology. In this issue you will find information on new Apache Sedona, f.k.a. GeoSpark, incoming research funding calls by Open Clouds for Research Environments (OCRE) for cloud and digital EO services, deck.gl - a state-of-the-art framework for visualisation of large-scale spatial datasets on the web, openEO - an open API to connect different cloud-based EO back-ends in a unified way, and a new wetland dataset produced by open EO data and cloud-based methods. In addition to the our regular section on recent software releases, we have also a new section on upcoming events! Happy reading!

PS: Don't forget to read the CRIB News section to keep informed about the recent developments on our center!

If you find the newsletter useful, please share the subscription link below. You can access the previous issues of the newsletter on our web portal.

GeoSpark becomes Apache Sedona

GeoSpark was an open-source cluster computing system for processing large-scale spatial data, extending Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL and providing good spatial analysis performance. It was also featuring an Apache Zeppelin (i.e. web-based notebook) compatible map visualization layer with massively parallelized map building operators to visualize maps. Recently, their incubation proposal was accepted unanimously by the Apache Software Foundation and the system joined to the Apache incubator program as Apache Sedona!

Sustaining an open-source project is quite challenging and being part of the world-renowned Apache ecosystem will be beneficial for Apache Sedona to reach a wider active user and developer community. Many important "big data" projects are also under the Apache umbrella, hence we can also expect a good synergy. Good luck Apache Sedona!

OCRE is launching cloud and EO funding calls

Image credits: OCRE, 2020

Open Clouds for Research Environments (OCRE) will launch two funding calls on cloud and EO services for researchers. The first call with 3.25M Euro funding will be open on 15 September (for 6 weeks) and aims to support projects which intend to use commercial cloud and digital services to accelerate research outcomes and are not currently using these tools at scale. For more information, you can sign up for the infoshare webinar on 15 September, 11:00 CEST (Zoom).

The second call will be the first call of OCRE specifically focused on stimulating the adoption of digital EO services by researchers. For this 2M Euro call the aim is to support projects, which use data from the Copernicus programme, support FAIR data principles, and clearly demonstrate how the use of commercial EO services can provide agility, improve research, or enable new research outcomes. Two subsequent calls are planned to open in February and July 2021. For more information, you can sign up for the infoshare webinar on 14 October, 14:00 CEST (Zoom). It is also possible to express interest to stay informed of future calls.

Many commercial cloud and EO services are related to big geospatial data, hence these calls can be a good opportunity to get support for research projects requiring big data technology.

Upcoming Meetings

WebGL-powered visualization of large-scale datasets

Image credits: deck.gl (2020)

Big geospatial data is not only difficult to analyse, but it is also quite challenging to visualize. This is especially the case for large (3D) vector data that should be visualized on web browsers, which is the common way to access data on the cloud. deck.gl provides a high-performance, WebGL-based platform for visualization of such large data sets. It supports tiled layers, various camera views, cartographic projections, environmental lighting and provides performant rendering leveraging GPUs. kepler.gl, which is a powerful open source geospatial analysis tool to explore geo-temporal data, is built with deck.gl. The platform can also be combined with Google Earth Engine. Developed mainly by Uber's Engineering team in the last 5 years, deck.gl has recently moved to an open governance model in August, which will allow community-driven planning and development process.

deck.gl has an extensive set of features and if you have some spare time, the examples and showcases available on their website are really worth to have a look. Check also kepler.gl!


Image credits: Pebesma et al. (2016)

Using cloud-based platforms to access and process big EO data is becoming the new norm, especially for large-scale studies. However, current EO cloud back-ends have different APIs, requiring significant time and effort to get acquainted and use them efficiently. It is difficult to compare their capabilities and costs, or to combine them in a joint analysis. Validation and reproduction of the analysis results between the platforms is also challenging.

openEO aims to help with these difficulties by providing an open API to connect R, Python, JavaScript and other clients (e.g. QGIS) to different EO cloud back-ends in a simple and unified way. A web-based visual editor is also available for interactive use. The project was funded by H2020 EO Big Data Shift call and will be finished this month. But a new project funded by ESA, openEO Platform, is just started to bring openEO to production and offer data access and processing services to the EO community.

Recent Releases

the "Big" Picture

Large-scale wetland mapping and monitoring is challenging due to factors such as inaccessibility and diversity of wetlands, fuzziness of their boundaries, and cost and time requirement for field data collection. By leveraging cloud-based techniques, Mahdianpari et al. (2020) produced a new (2nd generation) 10 m wetland inventory map of Canada using a Random Forest classifier and data collected from dual-polarimetry Sentinel-1 SAR and multi-spectral Sentinel-2 data. About 28,000 C-band L1 GRD S1 and 72,000 S2 images (with cloud-cover less than 20%) from the summers of 2017-2019 available in Google Earth Engine platform were processed in 78 days. Overall accuracies for the 13 ecozones examined in the study ranged from 76% to 91%, representing a 7% improvement over the first generation.

Mahdianpari, M. et al. (2020) The Second Generation Canadian Wetland Inventory Map at 10 Meters Resolution Using Google Earth Engine, Canadian Journal of Remote Sensing, doi:10.1080/07038992.2020.1802584

Serkan Girgin

To support big geodata activities at ITC we are planning to use a hybrid approach, which utilizes both local and cloud-based computing infrastructure. The local infrastructure will mainly support (self)training and exploratory research activities and we have an innovative solution to provide a high number of CPUs as well as GPUs to facilitate distributed computing needs in a cost-effective manner: a cluster of NVIDIA Jetson AGX Xavier units. Each unit will provide 8-core 64-bit CPU, 512-core Volta GPU with Tensor Cores, 32 GB LPDDR4 memory, 500 GB high-speed SSD, and 10 TB external storage. They will be available in the second half of September. Stay tuned!

For any comments or suggestions about the newsletter, or if you want to contribute, please simply send me an e-mail.

If you find it useful, please share this link with your colleagues so that they can also subscribe. Thanks!