Waleed Alzuhair, flickr

Big Geodata Newsletter, January 2021

Become a high-skilled geospatial professional

Greetings from the Big Geodata Newsletter!

Happy new year! When we were telling our best wishes for 2020, probably none of us was expecting such an unusual year. It was extraordinary. We hope that 2021 will be different, more "usual"!

Normally in this introduction part we summarize the newsletter content by listing the news items. This time we have only one: our new Geospatial Data Analysis Platform. It required a significant effort, but now we have a state-of-the-art interactive computing platform featuring GPU-backed and distributed data analysis and visualization capabilities directly accessible from your home or office (or home-office) computer. This first issue of the new year is devoted to the platform and will provide you information on its main features and capabilities. Please login to the system and have a closer look. We hope you will like it! Happy reading and testing! 

PS: At the end of the month, you will also receive a second newsletter with usual content, i.e. recent developments in the big geodata domain.

If you find the newsletter useful, please share the subscription link below. You can access the previous issues of the newsletter on our web portal.

Geospatial Computing PLatform

Image credits: CRIB, 2021

The platform provides each user a containerized and isolated working environment that ensures privacy. Your assets are protected against hardware failures by replicated storage with minimum two copies. Software packages are ready to use out-ot-the-box, without any further setup required. They are also kept up to date to allow the use of latest, state of the art features. Most of the packages are manually configured and fine-tuned to ensure best performance by utilizing multi-threaded and GPU-assisted low-level libraries. The default interface of the platform is JupyterLab, which enables you to work with interactive notebooks and documents through text and code editors, terminals, and other custom components (e.g. map widgets).

Do you want to try it? Just login with your UT credentials!

Platform Components

Image credits: CRIB, 2021

Built on a cluster of specialized NVIDIA Jetson AGX computing units supported by additional servers, the platform allows fast parallel and distributed computing. Each computing unit has 8-core ARM v8.2a 64-bit CPU, 512-core Volta GPU with Tensor Cores, 32GB 256-bit LPDDR4 RAM, and 10 TB dedicated storage. They operate at 10–30W ensuring low energy footprint, albeit high performance.

Each unit can be used individually for geospatial data analysis and computing purposes by leveraging their multi-core processing capabilities. But they can also be used all together as a single cluster for big data computing needs. The platform features managed and ready-to-use Dask, Apache Hadoop, and Apache Spark clusters. All clusters can be monitored through the web interfaces available on the launcher.

Main server of the platform is kindly donated by the ESA Department. 3 NVIDIA Jetson AGX units of the NRS Department are also currently "parked" at the platform. We are grateful for their support! If you have idle computing resources, just let us know. We can repurpose them and make available through the platform.

Additional Services

Image credits: CRIB, 2021

The notebook interface and terminal access allow interactive data analysis, computing, and visualisation through a wide-range of software packages and libraries. However, quite often you may also need additional services, such as a database server to store your data, a map server to publish your maps, a code repository to share your research code. To facilitate and support your work the platform provides additional ready-to-use services! They are accessible through the launcher or platform portal home page by using your UT credentials.

PS: If you use the launcher, the services are directly accessible from your workspace as separate tabs, which you can arrange easily and use side-by-side.


Useful Tools

Image credits: CRIB, 2021

Hundreds of software libraries, packages and tools are readily available on the platform for your scientific and geospatial analysis and visualisation needs! Check public/platform folder for the complete list, which is updated regularly. Do you need other packages? No problem, just let us know.  

Survey: Common Datasets

Image credits: CRIB, 2021

While developing the platform we asked you to indicate the software tools and packages you use frequently, so that we can make them available on the platform and ensure a service that fulfils your current and future needs. More than 45 people participated to the survey and helped us better identify the needs. We are happy that we managed to cover almost all the needs and hopefully we will cover more in the future.

Now, we need your help for another topic: datasets. We want to make (big) datasets that you currently use or willing to use in the future readily available on the platform. We have prepared a short survey for this purpose and your feedback will be highly appreciated! Here is the link.


the "Big" Picture

Image credits: CRIB, 2021

Performance metrics, which allow monitoring of the operation and identification of hardware or software problems, are crucial for proper management of the platform. But they are also important to better understand how computational methods and algorithms perform, so that the bottlenecks can be identified and improved easily. The platform collects a wide range of performance metrics using Prometheus monitoring and alerting toolkit and utilizes Grafana for interactive visualisation through dashboards. Good thing is that this information is available to all users. Do you want to check how the units are performing? Just click Grafana icon on the launcher and login with your UT credentials. You will see the "big picture"!

One more thing...

Image credits: OSCT, 2021

Similar to Big Data, Open Science is also on the rise and expected to change the research landscape in the next years. In fact, its impacts are already visible in many fields and it is becoming an important criterion for research funding as well. ITC has a growing Open Science community and now also has an Open Science Officer dedicated to support and facilitate related activities. One of such activities is the Open Science Community Twente (OSCT), which will kick off on 28th January with an online event.

Don't miss the kickoff it you want to learn more or join to the community!

Serkan Girgin

Development of the computing platform was a challenging task due to many components that need to be integrated on a hardware platform which is quite different from standard ones. But at the same time, it was a very nice learning experience. Now we have a good understanding of how things work deep inside and this will allow us to improve them by fine-tuning and optimizing their performance. Hopefully, this will provide a better user experience and lead to many research and thesis studies utilizing the platform! We are planning to facilitate this by providing hands-on training in the coming months. We are getting ready for the next phase! 

For any comments or suggestions about the newsletter, or if you want to contribute, please simply send me an e-mail.

If you find it useful, please share this link with your colleagues so that they can also subscribe. Thanks!