Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics #60

Merged
merged 42 commits into from
Jan 5, 2022
Merged

Statistics #60

merged 42 commits into from
Jan 5, 2022

Conversation

fhelen
Copy link
Contributor

@fhelen fhelen commented Dec 10, 2021

No description provided.

@coveralls
Copy link
Collaborator

coveralls commented Dec 10, 2021

Pull Request Test Coverage Report for Build 1654875142

  • 77 of 77 (100.0%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 99.658%

Totals Coverage Status
Change from base Build 1520238762: 0.01%
Covered Lines: 2621
Relevant Lines: 2630

💛 - Coveralls

Copy link
Contributor Author

@fhelen fhelen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@tbonfort tbonfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @fhelen
I don't find the meaning of force and approximate to be very clear. In practice I think we need to be able to expose this functionality:

  • getPrecomputedStatistics() => return true/false if present
  • recomputeStatistics(approxOK)
  • clear/set as is in your proposal

mixing force and approx in getStatistics is ambiguous (i.e. does force mean you need to recompute if the cached version is approximated, if the cached version is absent, or everytime?)

@thomascoquet
Copy link
Contributor

Hello Flo and Thomas,
taking the liberty of jumping in as I was about to PR the exact same feature from Kayrros@432aaff

On a functional scope, I argue it is important to never recompute stats using the whole raster unless explicitly requested by the user. This is why I think the wording approxOK is tricky.

For example:

bnd.Histogram(godal.Approximate())

will recompute the histogram using the whole raster:

godal/godal.cpp

Line 798 in 42a03dc

ret=GDALGetDefaultHistogramEx(bnd,min,max,buckets,values,1,nullptr,nullptr);

This may or may not be the intended behavior, but it caught me last week.

There are three ways to get stats:

  • cached in metadata
  • approximated
  • accurate using whole raster

To avoid pitfalls like the one I mentionned, I suggest the following API:

  • type statMode int ; const (cached statMode = iota; approximate; accurate)
  • type optStats struct { mode statMode }
  • bnd.Statistics(...StatsOption) (Statistics, bool) with the following options: godal.Cached(), godal.Approximate(), godal.Accurate()

By default, statMode is 0 so we only get cached stats.
To fallback if no cache:

stats, ok := bnd.Statistics()
if !ok {
    stats, _ = bnd.Statistics(godal.Approximate())
}

At least the API is explicit.

I suggest similar changes to the Histogram feature:

hist, ok := bnd.Histogram() // only approximate - don't think there is cached hist in GDAL
if !ok {
    hist, _ = bnd.Histogram(godal.Accurate())
}

We could even call bnd.Statistics(godal.Approximate()) if Min/Max are not provided in the first case?

Thomas

@tbonfort
Copy link
Member

On a functional scope, I argue it is important to never recompute stats using the whole raster unless explicitly requested by the user.

I agree, except I would be even more strict about this and remove the "using the whole raster" part, as depending on the dataset even an approximation might induce a full read.

bnd.Histogram(godal.Approximate())
_will_ recompute the histogram using the whole raster:

good to know, thanks for alerting. This needs to be fixed.

* `type statMode int` ; `const (cached statMode = iota; approximate; accurate)`

cached is orthogonal to to approximate|accurate and therefore shouldn't be put in a single parameter

@fhelen
Copy link
Contributor Author

fhelen commented Dec 15, 2021

Hello @tbonfort & @thomascoquet
Thanks both of you!
Here are my proposals:

  • Statistics setter
    User can set his own stats through:
    err := bnd.SetStatistics(min, max, mean, std)
    This function is a wrapper around GDALSetRasterStatistics() and populate metadatas.

  • Statistics computation
    User can ask for statsitics computation. Moreover user can ask for accurate or approximated stats. Note that, this function force computation even if statistics are presents in metadatas.

    • Accurate statistics
      stats, err := band.ComputeStatistics()

    • Approximated statistics
      stats, err = band.ComputeStatistics(godal.Approximate())

    This function is a wrapper around GDALComputeRasterStatistics() and populate metatdatas.

  • Statistics getter
    User can ask for pre-computed statistics if they are present. If not a boolean is return as false.
    flag, stats, err := bnd.GetStatistics()

    Since GDALGetRasterStatistics() is not a pure getter on statistics, GetStatistics is not wrapper around GDAL function.
    GODAL function is just a parser of metaDatas.
    This prevent from:
    + use of appoxOk & bForce flags;
    + compute pre-computed statistics;

    From user pre-computed stats, workflow could be:
    err := bnd.SetStatistics(min, max, mean, std) // do some stuff flag, stats, err = bnd.GetStatistics()

    From approximated statistics, workflow could be:
    stats, err := band.ComputeStatistics(godal.Approximate()) // do some stuff flag, stats, err := bnd.GetStatistics() //stats.Approximate == true

  • Statistics cleaner
    User can clear statistics through:
    err := ds.ClearStatistics()

    This function is a wrapper around GDALDatasetClearStatistics(). This function is new feature from GDAL 3.2.

@tbonfort tbonfort merged commit d7e9f7a into airbusgeo:main Jan 5, 2022
@fhelen fhelen deleted the statistics branch January 5, 2022 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants