-
-
Notifications
You must be signed in to change notification settings - Fork 15
Description
As the concept of "cloud-native geospatial" gains traction, it's increasingly important to define what it means. In fact, I believe that maintaining a useful definition of the term is one of our primary functions as the Cloud-Native Geospatial Foundation.
It turns out that it's hard to define!
We surveyed our community earlier this year to get their take on it. We got 56 responses. Half of respondents came from public sector orgs (government, nonprofit, and academic) and half came from commercial organizations. 80% of respondents are from North America, Europe, Australia, or New Zealand. 13% of respondents are from Asia, 5% from South America, and 2% from Africa. Less than 10% of respondents consider themselves "beginner" level users of cloud-native geospatial solutions.
These were the most commonly used adjectives in the responses to the question "What does the term “cloud-native geospatial” mean to you?":
- scalable
- optimized
- efficient
- large
- remote
- data-adjacent
- seamless
- fast
- standardized
- simple
- parallelized
- consumable
- on-demand
- preprocessed
- accessible
- interoperable
- easy
- quick
Just listing the adjectives is interesting because it highlights the benefits that people are seeking, but it doesn't do a good job of describing the features of cloud-native formats. The full responses to the question are a bit more illuminating and you can see them in this gist.
I ended up boiling everything down to this very simple definition:
Cloud-native formats allow people to build applications on top of data using simple HTTP APIs.
Implicit in this definition is that cloud-native formats allow people to build good or reasonably performant applications.
Last night, at our first Seattle Zarr Meetup, @jiayuasu gave a presentation in which he provided a more comprehensive set of characteristics:
- Efficient storage
- high compression ratios
- Scalability
- Multiple data chunks for parallel processing
- Each computer/process can take a chunk at choice (Random access)
- Integrity constraints and schema evolution
- All new data must pass Integrity check (i.e., type check)
- Adding/removing a column should not rewrite the entire data
- Natively support geometry / geography / raster type data
- Metadata integration
- Geo-statistics: bounding box, CRS, …
- Store metadata alongside with data for advanced operations such as filter pushdown
- Open protocol
- Easy data exchange / sharing
- Anyone can implement their own reader / writer to r/w data in this format
Note that the bolded points are what would make a merely cloud-optimized format a cloud-optimized geo format.
I don't disagree with this list. It's a good set of best practices for scalable formats that will take advantage of object storage, but they don't apply to some things that we'd consider cloud-native. STAC certainly doesn't have all of these characteristics.
I think Jia's final point and subpoenas drive the point home:
- Open protocol
- Easy data exchange / sharing
- Anyone can implement their own reader / writer to r/w data in this format
This takes me back to the simple definition: "Cloud-native formats allow people to build applications on top of data using simple HTTP APIs."
HTTP might be too prescriptive here. Maybe "open protocol" is enough, but I think HTTP is enough for us to bet on at this point. Or maybe we say using generic RESTful APIs?
One other thing to add here, which is inspired by "Anyone can implement their own reader / writer to r/w data in this format": the Cloud-Native Geospatial Foundation will only be able to support projects that are developed using open source principles. Specs must be available under an open license and must be open to contributions from anyone. We don't pick winners, but we do aim to keep track of implementations of formats, which is one way we can measure adoption and the practical benefit of different standards.
Having said far too much here, I'm open to suggestions on how we should define "cloud-native geospatial" data formats. Discuss!