Update history, refactor sdba processing functions, add unit handling in sdba #801

aulemahal · 2021-08-19T22:41:05Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
Tests for the changes have been added (for bug fixes / features)
Documentation has been added / updated (for bug fixes / features)
HISTORY.rst has been updated (with summary of main changes)
bumpversion (major / minor / patch) has been called on this branch
Tags have been pushed (git push --tags)
The relevant author information has been added to .zenodo.json

What kind of change does this PR introduce?

Moves unit handling sutff from [sdba] Optimizations and fixes #791 to here. Inputs of train and adjust methods are converted to common units. ref is used as the reference. Add unit handling in processing : adapt_freq, jitter_under/over_thresh, etc. Add unit handling to the Detrend objects.
Splits user-facing and metadata-handling function from their computational counterparts when map_blocks is used. This applies to processing.py. This also changed the signature of the concerned functions, so that they get an explicit one.
Generalize the setting of a xclim_history attribute through a new @update_xclim_history decorator. Decorated functions will have a correctly formatted xclim_history attribute added to their first output.

Does this PR introduce a breaking change?

Yes. Signature of adapt_freq, jitter_under_thresh and others has changed. All variables are given as DataArrays (no more dataset as input) and threshold are given as quantities, so strings, not numbers.

Other information:

We have a naming problem with the xclim_history attribute. Currently, the update_history function will take the inputs and merge their history attributes. Then we take it and put it in xclim_history. If a variable has been modified before by xclim, the xclim_history attribute is overwritten, not appended. Should we go back to history to make it simpler? @Zeitsperre @huard This applies to the indicators too!

review-notebook-app · 2021-08-19T22:41:08Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

huard · 2021-08-20T12:24:24Z

Remind me why we decided to have xclim_history in the first place.

huard

How confident are you that our test coverage would have picked potential errors introduced in this PR?
From experience, it's easy to break unit handling, and tests were not initially covered to check this. You added unit related checks, but how tight are we overall ?

docs/notebooks/sdba-advanced.ipynb

docs/notebooks/sdba.ipynb

xclim/core/formatting.py

xclim/sdba/__init__.py

xclim/sdba/_processing.py

Co-authored-by: David Huard <[email protected]>

aulemahal · 2021-08-20T13:57:26Z

We decided to use xclim_history because CF conventions say that history is a dataset attribute, but we are always dealing with DataArrays.

As for the tests, I did add more, but you are right that it's always a tricky question. Failures and bugs in the attributes will not always trigger test failures... I plan to add a bit more.

aulemahal · 2021-08-20T20:21:40Z

@Zeitsperre I'd like to hear your opinion on a change from xclim_history to history!

I think this PR as revealed a faille in our logic of being CF-orthodox and not using history : If xclim performs successive actions on a variable, only the latest is recorded in the xclim_history attribute. Moreover, it might append the input's history, as if that latest action was the only one.
I would suggest going back to history as it is the one that makes the most sense and is understood by the largest crowd. And it would solve the bug. It would be only slightly non-CF, as the history attribute is not flagged as a variable attribute by the conventions.

Zeitsperre · 2021-08-23T14:37:05Z

@Zeitsperre I'd like to hear your opinion on a change from xclim_history to history!

I think this PR as revealed a faille in our logic of being CF-orthodox and not using history : If xclim performs successive actions on a variable, only the latest is recorded in the xclim_history attribute. Moreover, it might append the input's history, as if that latest action was the only one.

This is true. We should be at the very least appending some general summary of actions taken to the history attribute. If we want to revisit this, my feeling is that the xclim_history should be used for debugging purposes, IE: add the call signatures used, xclim version, whether the data was Indicator-compliant. This should have some internal provenance-like information that the general user doesn't need (and doesn't know to look for).

I would suggest going back to history as it is the one that makes the most sense and is understood by the largest crowd. And it would solve the bug. It would be only slightly non-CF, as the history attribute is not flagged as a variable attribute by the conventions.

From my understanding, because it isn't flagged as a variable attribute means that we are free to do it. Try adding a history and xclim_history to a variable in a NetCDF and run it through a CF-Checker. there shouldn't be any errors.

Zeitsperre · 2021-08-23T17:04:28Z

Very small point, but the Notes docstring for NpdfTransform could use a small fix while you're fixing things up: https://xclim.readthedocs.io/en/stable/sdba_api.html#xclim.sdba.adjustment.NpdfTransform

aulemahal · 2021-08-23T18:15:46Z

So I changed "xclim_history" to "history" throughout xclim!
@tlogan2000 (so you're in the loop too).

This is not super-breaking, but kinda.
The output of an indicator computation before:

xclim_history : [2021-07-07 18:46:28] growing_degree_days: GROWING_DEGREE_DAYS(tas=<array>, thresh='10.0 degC', freq='YS') - xclim version: 0.28.0.

now:

history :  [2021-08-23 14:05:45] growing_degree_days: GROWING_DEGREE_DAYS(tas=tas, thresh='10.0 degC', freq='YS') - xclim version: 0.28.2-beta.

If the indicator was from a virtual module, the module name would be appended in the call string (ex: icclim.CD(...)).

Applies also to ensembles percentiles and stats, percentile_doy, and a few others.

aulemahal · 2021-08-23T18:24:18Z

@Zeitsperre
The CF-checker returns the following on the file produced by usage.ipynb:

CHECKING NetCDF FILE: /tmp/27024.nc
=====================
Using CF Checker Version 4.1.0
Checking against CF Version CF-1.8
Using Standard Name Table Version 77 (2021-01-19T13:38:50Z)
Using Area Type Table Version 10 (23 June 2020)
Using Standardized Region Name Table Version 4 (18 December 2018)

WARN: (2.6.1): No 'Conventions' attribute present

------------------
Checking variable: time
------------------
WARN: (3): No standard_name or long_name attribute specified

------------------
Checking variable: lat
------------------

------------------
Checking variable: lon
------------------

------------------
Checking variable: growing_degree_days
------------------
INFO: attribute history is being used in a non-standard way
WARN: (7.3): Coordinate variable time should have bounds or climatology attribute
WARN: (7.3): Coordinate variable time should have bounds or climatology attribute

ERRORS detected: 0
WARNINGS given: 4
INFORMATION messages: 1

Zeitsperre · 2021-08-23T18:33:24Z

That's exactly what we should expect. LGTM.

aulemahal · 2021-08-23T19:55:31Z

I'm turning crazy soon. The sdba notebook compiles with a failure on the first try, but passes on the second try...

Zeitsperre · 2021-08-23T20:02:46Z

I'm turning crazy soon. The sdba notebook compiles with a failure on the first try, but passes on the second try...

This actually sounds insane, by Einstein's definition anyway. I'll give it a try as well.

Zeitsperre · 2021-08-23T20:10:51Z

Works for me on the first try. It may have to do with the compute() call. Jupyter has a habit of running through to the next cell before asynchronous processes are finished. Could it be that?

aulemahal · 2021-08-23T21:44:23Z

🤯 So I copied only this example in a new script and ran it 7 times. At the start, I deleted the xclim testing data and the script's output. Between trial 3 and 4, I modified the code to have some more info in the traceback (the algo was untouched).

Fail
Fail
Pass
Fail
Pass
Pass
Fail

Funny, eh?

EDIT: The compute call was the last operation of the script, in order to test @Zeitsperre's hypothesis.

Zeitsperre · 2021-08-23T21:49:48Z

What is going on?! This is really strange behaviour. Any information from the trace back (is it different from the doctests build)?

aulemahal · 2021-08-23T22:00:37Z

I could get the following info:

It always fails at iteration 0 of npdf_transform.
It doesn't fail for all chunks. There are three chunks in the data, so npdf_transform is executed thrice.
It always fails at line scensp = ADJ.adjust(simp, **kwargs["adj_kws"]) and shows that simp has no attributes at all, while histp and refp had some. And even though sim has some and xr.set_options(keep_attrs=True) was used.

Thus, it seems that, at random moments, the last matrix multiplication of npdf_transform doesn't respect the context under which it is executed and attrs are not copied.

EDIT: It may be obvious, but writing it to be sure : this only happens when dask is used,

aulemahal · 2021-08-23T22:09:06Z

Added an explicit setting of units to all ADJ inputs. And a comment to note that it is weird.
Before I had a 6/10 failure rate. Now I have a 0/3 failure rate. THUS, it works.

aulemahal · 2021-08-24T15:02:15Z

Now, that's more interesting, a real error! My bad.

aulemahal added 4 commits August 19, 2021 16:19

Split decorated func from user-facing

a658f64

cherry pick processing unit handling

c652999

cherry pick adj unit handling

32fffb1

Fix all unit handling - add history updating stuff

1dbb130

aulemahal requested a review from huard August 19, 2021 22:41

aulemahal added this to the v0.29 milestone Aug 19, 2021

aulemahal changed the title ~~[sdba] Refactor processing function + add unit handling~~ [sdba] Refactor processing functions + add unit handling Aug 19, 2021

huard approved these changes Aug 20, 2021

View reviewed changes

aulemahal and others added 3 commits August 20, 2021 09:52

Apply suggestions from code review

a9fd0ff

Co-authored-by: David Huard <[email protected]>

Update docs/notebooks/sdba-advanced.ipynb

2ec2683

Co-authored-by: David Huard <[email protected]>

upd sdba-advanced

8d78fd8

aulemahal added 4 commits August 23, 2021 13:24

rename xclim_history to history

322da69

fix tests and docs

b20e874

gMerge branch 'master' into sdba-refac-processing

f100ba5

updhist

680d3f6

aulemahal changed the title ~~[sdba] Refactor processing functions + add unit handling~~ Update history, refactor sdba processing functions Aug 23, 2021

aulemahal added 2 commits August 23, 2021 14:27

fix kernels

0aebb1a

fix kernels

0da2939

aulemahal changed the title ~~Update history, refactor sdba processing functions~~ Update history, refactor sdba processing functions, add unit handling in sdba Aug 23, 2021

aulemahal added 2 commits August 23, 2021 14:50

fix NpdfTransform - upd hist

dc3da05

fix DQM err - magic for the other

25eb4b2

Merge branch 'master' into sdba-refac-processing

5c36ef9

weird workaround

e9bd560

remove 2 debug cells

50b4078

aulemahal added 5 commits August 24, 2021 11:17

Fix gen_call_string

e7692e3

bumped the version

903c3a3

Merge branch 'master' into sdba-refac-processing

5201c38

Merge branch 'master' into sdba-refac-processing

1efac17

Merge branch 'master' into sdba-refac-processing

8c5a0ee

aulemahal merged commit d888c0f into master Aug 24, 2021

aulemahal deleted the sdba-refac-processing branch August 24, 2021 18:27

This was referenced Aug 27, 2021

[sdba] bug with LOCI map_blocks for add_dims in Grouper #818

Closed

Fix 818 #819

Merged

aulemahal mentioned this pull request Sep 7, 2021

Random missing attributes in map_blocks. #831

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update history, refactor sdba processing functions, add unit handling in sdba #801

Update history, refactor sdba processing functions, add unit handling in sdba #801

aulemahal commented Aug 19, 2021 •

edited

Loading

review-notebook-app bot commented Aug 19, 2021

huard commented Aug 20, 2021

huard left a comment

aulemahal commented Aug 20, 2021

aulemahal commented Aug 20, 2021

Zeitsperre commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021

aulemahal commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021 •

edited

Loading

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021 •

edited

Loading

aulemahal commented Aug 23, 2021

aulemahal commented Aug 24, 2021

Update history, refactor sdba processing functions, add unit handling in sdba #801

Update history, refactor sdba processing functions, add unit handling in sdba #801

Conversation

aulemahal commented Aug 19, 2021 • edited Loading

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

review-notebook-app bot commented Aug 19, 2021

huard commented Aug 20, 2021

huard left a comment

Choose a reason for hiding this comment

aulemahal commented Aug 20, 2021

aulemahal commented Aug 20, 2021

Zeitsperre commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021

aulemahal commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021 • edited Loading

Zeitsperre commented Aug 23, 2021

aulemahal commented Aug 23, 2021 • edited Loading

aulemahal commented Aug 23, 2021

aulemahal commented Aug 24, 2021

aulemahal commented Aug 19, 2021 •

edited

Loading

aulemahal commented Aug 23, 2021 •

edited

Loading

aulemahal commented Aug 23, 2021 •

edited

Loading