Waleed Alzuhair, flickr

Big Geodata Newsletter, November 2021

Become a high-skilled geospatial professional

Greetings from the Big Geodata Newsletter!

In this issue you will find information on the release of two datasets - TimeSpec4LULC and ESA WorldCover 10m, open call opportunities from C-SCALE, a “big picture” on machine learning market for Earth Observation, and one-more thing on Dask replacing Spark. Also welcome to join us at the Big Geodata Talk on openEO and the first Geospatial Computing Platform User Meeting! Our regular upcoming events and recent releases are here as well.

Happy reading! 

You can access the previous issues of the newsletter on our web portal. If you find the newsletter useful, please share the subscription link below with your network.

Open Invitation for Contribution to Newsletters

The newsletter has been happily being your company for one year and half. We hope it has added more fun about big geodata to your existing workspace. By now, the newsletter has reached more than 200 readers and this number is still growing. This has made the value of community participation more and more self-evident for the sake of bettering CRIB’s service function and overall benefiting our growth together. Thus we would like to open a new column in the newsletter for you, the readers!

We cordially invite your contributions to the newsletter to address your interests and concerns. The article formats are currently widely open. They can be your comments on certain big data technology, an announcement of your new software, share some knowledge and skills, solicitation for collaboration opportunities, etc. The stage is now open for you!

How to contribute? Simply send an email to crib-itc@utwente.nl or drop by at ITC 2-154!


Image credits: Khaldi et al., 2021

Deep learning neural networks have transformed numerous domain applications for its triumph technical performance, but not yet for Land Use and Land Cover (LULCs) mapping. This gap is currently being closed by the pre-release of TimeSpec4LULC as a result of the research effort led by Khaldi et al., 2021. TimeSpec4LULC is an open-source global dataset of multi-spectral time series for 29 LULC classes. It was built upon 7 spectral bands of MODIS at 500 m resolution from 2002 to 2021.

The generation of the 19-year monthly time series of 7 global bands required applying different spatio-temporal quality assessment filters on MODIS Terra and Aqua satellites, aggregating their 8-day temporal granularity into monthly composites, merging data into a combined time series, and extracting - at the pixel level - 11.85 million time series for all bands along with a set of metadata about geographic coordinates, country and departmental divisions, spatio-temporal consistency across LULC products, temporal data availability, and the global human modification index. Annotation was added using a spatial agreement across the 15 global LULC products available in Google Earth Engine and quality assessed via homogenous sampling, i.e., a sample of 100 pixels, evenly distributed around the world, from each LULC class, was selected and validated by experts using very high-resolution images from both Google Earth and Bing Maps imagery. This dataset is suitable for developing and evaluating various machine learning models, including deep learning networks, to perform global LULC mapping and change detection.

ESA WorldCover 10 m

Image credits: ESA, 2021

A brand-new global land cover map at 10 m spatial resolution came out at the end of October, which resulted from over two years of joint efforts of ESA and her partners. This open access WorldCover map contains 11 land classes, covering use cases such as agriculture, biodiversity & nature conservation, land use planning, natural capital accounting, as well as climate change. An additional mangrove class was also included on request.

One of the key benefits of the WorldCover map is its unprecedented details. By leveraging both Sentinel-1 and Sentinel-2 satellite data, it rendered land cover information in areas with persistent cloud cover and updated the land cover map nearly in real time besides the enhanced spatial resolution. The map has been independently validated following the requirements of the Committee on Earth Observation Satellites (CEOS) Working Group on Calibration and Validation (WGCV) Land Product Validation (LPV). This validation, carried out by Wageningen University, showed that the overall accuracy of the WorldCover product with 11 classes is 74.4% on a global scale, with accuracy levels by continent ranging from 68 to 81%. Now the data is invited for exploration!

Upcoming Meetings


Image credits: C-SCALE, 2021

C-SCALE (Copernicus eoSC AnaLytics Engine) project intends to empower European researchers, institutions, and initiatives to easily discover, access, process, analyse and share Copernicus data, tools, resources, and services through the EOSC (European Open Science Cloud) Portal in a way that can be seamlessly integrated into their processes and research practices. Copernicus - the preeminent source of environmental information in the world-- readily supports numerous services that are free and openly accessible to users. EOSC adds several federations of service providers and research initiatives and solution providers into the shared innovation space. C-SCALE demonstrates the power of these combined resources and makes it more approachable.

The project’s Big Copernicus Data Analytics services streamline the integration of models, projects, and programmes. Concrete production-ready pilot applications will be used to drive and validate the technical work. These applications will serve both as demonstrators of the EOSC-Copernicus opportunities and as models that can be adapted for other uses. The project will also support open innovation through its call for additional pilot applications and initiatives.

Recent Releases

The "Big" Picture

Image credits: Radiant Earth, 2021

The latest map of Machine Learning for Earth Observation became available from the crowdsourcing effort on social media. The entries hint toward the incredible aptitude of organizations to optimize these innovative technologies and expand them in the service of humanity. The map is divided into two broad categories - commercial and non-commercial and five groups within each category. More specifically, these five groups are distinguished by their individual focus: 

Further detailed scrutinisation of the map can possibly yield more granular information about the players in the market of ML for EO and other possible analytical outcomes. You can access a high-resolution version of the map from this link.

One More Thing...

Image source: Coiled. 2021

The era of big data is not only featured by data piling but also themed with advancing technologies for data handling. Dask and Apache Spark are among the front row of the big data crunching engines on the show stage. Currently, there is a movement in the Dask community to extend its functionality with the aim of replacing Spark. However, the initial design of Dask has laid a few technical hurdles to overcome on its ambitious path of expansion. For instance, Apache Parquet (the default tabular storage format) is supported by Python, but not well enough when the data size becomes on the scale of TB. Another issue is the lack of a bona fide solution in the Python space on a par with technologies like Delta Lake to manage data lakes. Another piece to improve is Dask's suboptimal shuffling mechanism, which becomes particularly important when it comes to databases sorting and merging. Overall, the momentum of advancing Dask toward Spark territory is on. Those who have interests to become partners or customers can simply jump on the bandwagon from the blog page