Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensions & patterns for Time Series data #19

Open
cholmes opened this issue Oct 10, 2024 · 4 comments
Open

Extensions & patterns for Time Series data #19

cholmes opened this issue Oct 10, 2024 · 4 comments

Comments

@cholmes
Copy link
Contributor

cholmes commented Oct 10, 2024

There's a number of potential extensions for data in time series, for example:

  • Daily crop biomass summary (like from Planet)
  • Regular Evapotranspiration data from OpenET and others
  • Periodic NDVI, like from cloud-free Sentinel 2 images
  • Monthly summaries of stats from imagery
  • Seasonal / Yearly crop classification
  • etc.

Each may be worth it's own extension, so we can break them out as we get closer.
But one 'meta' question is how we handle / model these in fiboa. Our default way is to just hang things directly off the geometry, like in the same GeoParquet file. But many of these would result in tons of columns, and it's not clear you'd always need the geometry. So it could be good to get a sample dataset of a big time series and figure out how to use fiboa but make it more of a 'reference'. Hopefully the flexibility of not differentiating between collection and feature level attributes introduced in fiboa/specification#39 will help, but it seems like we'd ideally have a way to validate Parquet files that just have a reference to a geometry instead of including it directly.

@m-mohr
Copy link
Contributor

m-mohr commented Oct 11, 2024

In "old" database days you'd normalize into two tables, e.g.

geo.parquet: id, geometry, area, perimeter
time.parquet: id, geo_id, datetime, value1, value2, ...

Wouldn't be in one file, but on the other hand this approach has proven to work well in database world. So I'm wondering whether we should split the files. Is this something tooling can handle easily?

This doesn't cater for geometry changes over time though (unless you create two independant entries in geo.parquet) and/or add another independant identifer that is stable over time.

@cholmes
Copy link
Contributor Author

cholmes commented Nov 5, 2024

Yeah, I definitely lean towards two files. I don't think there's tooling for pure parquet that handles this well, but I think we can just make some tooling that helps. I think the main thing is to just work out how to do the references, and how you can validate a fiboa file that has a reference instead of a geometry. And potentially figure out some of the corner cases like geometry changes over time.

@m-mohr m-mohr moved this to Backlog in fiboa Nov 19, 2024
@m-mohr m-mohr added this to fiboa Nov 19, 2024
@ivorbosloper
Copy link

This relational solutions comes to mind for different aspects. The GPKG file format solves the distribution and join-tools, because it's just an sqlite database. However, when I encounter gpkg-files with multiple tables, my tooling (QGIS, GDAL) does not seem to 'automatically' understand how to join tables... But maybe they've designed a proper convention to support this.

@m-mohr
Copy link
Contributor

m-mohr commented Mar 10, 2025

Yeah, we should also investigate whether something like this exists for Parquet in general. This is not geo-specific and I can't really imagine that we are the first ones stumbling across this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

3 participants