Home ITCAbout ITCCentres of expertiseCenter of Expertise in Big Geodata ScienceBig Geodata TalksThe Planetary Computer: Putting Global-Scale Geospatial Data to Work for Conservation and Sustainability
Waleed Alzuhair, flickr

The Planetary Computer: Putting Global-Scale Geospatial Data to Work for Conservation and Sustainability

Become a high-skilled geospatial professional

Environmental sustainability depends on very large geospatial data sets, particularly satellite imagery and climate forecasts. But working with geospatial data – whether it’s large or not – is a deep skill set unto itself, and working with very large data – whether it’s geospatial or not – is its own field of expertise as well. Consequently, the niche expertise required in GIS and distributed computing creates a huge barrier between this invaluable data and the sustainability practitioners who need it. Microsoft’s Planetary Computer aims to lower that barrier, by combining (1) a 25PB catalog of analysis-ready geospatial data, in consistent file formats, in a single data center, (2) an API that facilitates spatiotemporal querying over that data, and (3) a computing environment that simplifies distributed computing workloads. Of course, files and distributed computing don’t address environmental issues directly, so Microsoft is also partnering closely with the sustainability community to build applications that put the Planetary Computer to work for sustainability decision-making.

This talk will highlight features and capabilities of Microsoft's Planetary Computer and showcase selected applications. Possible collaboration options will also be discussed.

Speaker: Dr. Dan Morris

Dan Morris is a Principal Scientist with Microsoft’s AI for Earth program, focused on accelerating innovation at the intersection of machine learning and environmental sustainability, particularly through the Planetary Computer platform. When he’s not moving geospatial data around on the cloud, his work includes computer vision applications in wildlife conservation, for example the AI for Earth Camera Trap Image Processing API. Prior to joining AI for Earth, he worked in Microsoft’s Medical Devices Group, developing signal processing and machine learning techniques for cardiovascular health monitoring, along with earlier work on signal processing and machine learning for input systems, making medical information more useful to hospital patients, automatic exercise analysis from wearable sensors, and generating musical accompaniment for vocal melodies (the “Songsmith” project). Before coming to Microsoft, he studied neuroscience at Brown, and developed brain-computer interfaces for research and clinical environments. His PhD work at Stanford focused on haptics and physical simulation for virtual surgery.

Video

Presentation

Questions and Answers

  • How do you access the Planetary Computer?

    Users can request access at https://planetarycomputer.microsoft.com/account/request. If you’ve met us at a talk or an event, it doesn’t hurt to e-mail planetarycomputer@microsoft.com as well, to make sure we don’t miss your request.

  • Are you planning to provide continental level analysis-ready data, like monthly composites of Sentinel images?

    “Analysis-ready” means different things in different contexts, but because the question mentions Sentinel data and suggests visual composites, I’ll answer with respect to Sentinel-2 specifically.

    We have taken steps to make our Sentinel-2 data analysis-ready; in particular, our Sentinel-2 collection has been processed to bottom-of-atmosphere with Sen2Cor, and we will be releasing an improved set of cloud masks in the next few months (in addition to the masks produced by Sen2Cor, which are already available as part of the collection).

    We are considering the development of quarterly Sentinel-2 composites, but we are not yet actively working on this. We welcome feedback on when and how this would be helpful above and beyond atmospherically-corrected data with cloud masks; please send feedback to planetarycomputer@microsoft.com.

  • Is the Jupyter interface the only way to access the Planetary Computer or are there other access options available?

    Our Data Catalog is the heart of the Planetary Computer, and all of the other tools we provide (e.g. our STAC API, our Jupyter-based Planetary Computer Hub, and our Planetary Computer Explorer) exist to make it easier to do environmental sustainability work with data from our catalog.

    For many users, the raw data is the most convenient way to work with our catalog, so we document the storage conventions for all of our datasets, so users can access data via Azure Blob Storage directly, without using our API at all. For most users, though, our STAC API provides a significant convenience over raw paths, and our STAC API is accessible from any environment, whether you’re working in the Planetary Computer Hub or not.

    We are also working with the open-source community to facilitate the use of other analysis stacks, e.g. OpenDataCube, and we are working with our colleagues at Esri to put Esri Image Services in place for some data sets to make it easier to work with our data via Esri tools (e.g. this Image Service that’s in place for the Planetary Computer NAIP archive).

  • How many users can the Planetary Computer platform serve simultaneously?

    It depends what you mean by "platform" :) Most of the time, the data and API layers are not near their practical limits, even when many users are concurrently accessing the platform. I think this question is primarily about the Planetary Computer Hub (our interactive computing environment), which is a shared resource with finite compute capacity. GPU nodes in particular are allocated on a first-come, first-served basis, so users may not always be able to allocate GPUs, and the creation of large cluster jobs will be limited by availability.

    That said, we do gradually adjust compute limits to match demand, and we are working on a batch computing mechanism that will allow workloads created in our JupyterLab environment to be submitted to a shared queue, which makes resource sharing more straightforward, and less sensitive to who allocates nodes when. We have also open-sourced the configuration of the Hub itself, and we document the process of configuring your own Hub, so that users working on Azure who don’t want to use shared computing resources can clone the Hub entirely, and still access our data and APIs.

  • Are results of computations persistent on the platform?

    User home directories on the Planetary Computer Hub are persistent, yes. This is large enough for a few tens of gigabytes of storage; for larger results, we provide examples of writing to user-owned Blob Storage.

  • How can we upload data to the Planetary Computer?

    We provide examples of interacting with data in user-owned Azure Blob Storage, but we are also working on a more seamless way to interact with user-owned data. Specifically, we’d like users to have the same experience querying their own geospatial data that they have querying data from the Planetary Computer Catalog; stay tuned for updates on this in the next couple of Planetary Computer updates.

  • Do you have GPU support for machine learning?

    Yes, our Planetary Computer Hub includes images for both PyTorch and TensorFlow that run on GPU-enabled virtual machines. As per above, GPUs are a limited resource, so availability is dependent on load and not guaranteed.

  • What is the best way to run a long-duration computational task without the memory being recycled and progress being lost?

    Currently, the Planetary Computer Hub is based on a JupyterLab experience that is best designed for synchronous workflows. However, we are working on a batch job system that we hope to roll out in the next couple of Planetary Computer updates.

    But maybe more importantly, the Hub is just one way of interacting with the Planetary Computer Data Catalog and API, and we encourage users to find the toolset that’s right for them. This is one of our core design principles: if the Hub isn’t the right tool for your workload, and you access Planetary Computer data another way, you’re still a Planetary Computer user to us. :)

    For example, our colleagues at Impact Observatory used the Planetary Computer’s Sentinel-2 collection and STAC API to run a machine learning model on a full year of Sentinel-2 data. For this scenario, Azure Batch was a better fit than the Planetary Computer Hub, and it’s important to us to support use cases like this in addition to use cases focused on the Hub.

  • Do you plan to have beta testing of new features for a specific group of people who might be interested in contributing their opinions?

    Planetary Computer is brand new and is currently in preview, so in a sense, all of our users are helping us try new features. We test as extensively as we can before releasing, but one of the benefits of our open-source approach is that we’re constantly looking to our users to both identify bugs and make suggestions that will help us improve the Planetary Computer.

  • Are there any plans to facilitate AI for Earth grants for projects that for example produce data that could be beneficial to the Planetary Computer?

    Yes, the AI for Earth grants program has supported a variety of organizations who contribute novel data to our catalog (e.g. our grantees at NatureServe make their Map of Biodiversity Importance available on the Planetary Computer, as well as via other channels), or who curate existing public data sets for release through the Planetary Computer Data Catalog. The release of open data is not a requirement for our grants program, but it’s definitely something we encourage. We document the producers and processors of all of the data sets on the Planetary Computer Data Catalog, so you can take a look at our Web page to get a sense for who has contributed data, and you can also check out our AI for Earth Grantee Gallery to see some of the data sets and open-source code that our grantees have released.

  • How does becoming a partner with the Planetary Computer project work?

    I believe this question is in reference to our AI for Earth Partners page. The organizations listed on this page are all organizations using the Azure cloud for environmental sustainability, whose work we’ve supported through our grants program. There is not a precise difference between these grantees and the other grantees we support through our grants program; the main thing that our “partners” have in common is that they have not only used Azure to do important environmental science work, but also maintain cloud-based platforms that help others do their cloud-based environmental sustainability work as well.