Home ITCAbout ITCCentres of expertiseCenter of Expertise in Big Geodata ScienceBig Geodata TalksHigh-performance Spatial Data Management and Analysis with DuckDB
DuckDB

High-performance Spatial Data Management and Analysis with DuckDB

Join us for the Big Geodata Talk on DuckDB Spatial!

DuckDB is a novel in-process SQL database system designed for analytical workloads that has been making waves in the data science and engineering community. Not only for its impressive performance, but also for its focus on ease of use and integrations with the wider data ecosystem. A key part in making this possible is DuckDB's flexible extension system that enable DuckDB to be used across different domains while the core system itself remains small and focused. One such extension is the DuckDB Spatial Extension which brings geospatial data processing capabilities to DuckDB, allowing users to perform complex spatial queries and transformations. By incorporating the trifecta of foundational open source GIS libraries: GDAL, GEOS and PROJ as well as natively implemented geospatial algorithms all neatly packaged into a single binary with no runtime dependencies, the spatial extension provides hundreds of familiar spatial SQL functions and import and export capabilities to and from dozens of different vector file formats.

Just like DuckDB tries to default to the behavior or PostgreSQL, the spatial extension is heavily inspired by PostGIS and similarly follows the Simple Features SQL standard. However, while the Simple Features geometry model undoubtedly provides a great deal of flexibility with its hierarchy of subtypes (points, linestrings, multipolygons) and optional Z and M dimensions, it is not always the most efficient representation for modern high performance processing. While the spatial extension implements a bunch of geospatial algorithms natively to try to make the most of DuckDBs vectorized execution engine and memory model, it also complements the GEOMETRY type that we all know and love with a new set of strongly typed spatial types backed by a columnar storage model, similarly to what is being proposed in the GeoArrow project. This makes DuckDBs spatial extension an exciting project as it stands with one foot firmly in the traditional open source GIS world and the other in the modern data science and engineering movement.

In this talk, we will introduce DuckDB and the DuckDB Spatial Extension, walk through some of the internals that make DuckDB special as well as some of the challenges and design decisions encountered when adapting it for geospatial processing. We will also showcase some of the main features the spatial extension brings to the table today and share some insights into the future of the project.

Date

31 May 2024, 11:00-12:00 CET

Venue

ITC Langezijds Building, Room LA 1212
Hallenweg 8, 7522 NH Enschede

or

Online

Registration

Please fill-in the registration form to attend the event.

Speaker

Max Gabrielsson
Software Engineer, DuckDB Labs

Max Gabrielsson is a software engineer at DuckDB Labs where he works on the DuckDB database system in general and is the primary developer of the spatial extension. Previously he's held a variety of roles at different small-scale startups and co-founded a company selling maps, which sparked his interest in geospatial data management and processing. Max holds a BSc in Computer Science from Uppsala University.