Changelog
Added
- Hyperparameter Tuning for EDS-NLP: introduced a new script
edsnlp.tune
for hyperparameter tuning using Optuna. This feature allows users to efficiently optimize model parameters with options for single-phase or two-phase tuning strategies. Includes support for parameter importance analysis, visualization, pruning, and automatic handling of GPU time budgets. - Provided a detailed tutorial on hyperparameter tuning, covering usage scenarios and configuration options.
ScheduledOptimizer
(e.g.,@core: "optimizer"
) now supports importing optimizers using their qualified name (e.g.,optim: "torch.optim.Adam"
).eds.ner_crf
now computes confidence score on spans.
Changed
- The loss of
eds.ner_crf
is now computed as the mean over the words instead of the sum. This change is compatible with multi-gpu training. - Having multiple stats keys matching a batching pattern now warns instead of raising an error.
Fixed
- Support packaging with poetry 2.0
- Solve pickling issues with multiprocessing when pytorch is installed
- Allow deep attributes like
a.b.c
forspan_attributes
in Standoff and OMOP doc2dict converters - Fixed various aspects of stream shuffling:
- Ensure the Parquet reader shuffles the data when
shuffle=True
- Ensure we don't overwrite the RNG of the data reader when calling
stream.shuffle()
with no seed - Raise an error if the batch size in
stream.shuffle(batch_size=...)
is not compatible with the stream
- Ensure the Parquet reader shuffles the data when
eds.split
now keeps doc and span attributes in the sub-documents.
Pull Requests
- fix: support packaging with poetry 2.0 by @percevalw in #362
- Solve pickling issues with multiprocessing when pytorch is installed by @percevalw in #367
- Feat: add hyperparameters tuning by @LucasDedieu in #361
- Fix issue 368: Add
metric
parameter and write optimalconfig.yml
at the end of tuning. by @LucasDedieu in #369 - Fix issue 370: two-phase tuning now write phase 1 frozen best values into phase 2
results_summary.txt
by @LucasDedieu in #371 - fix: allow deep attributes in Standoff and OMOP doc2dict converters by @percevalw in #381
- fix: improve various aspect of stream shuffling by @percevalw in #380
- fix: eds.split now keeps doc and span attributes in the sub-documents by @percevalw in #363
- feat: allow importing optims using qualified names in ScheduledOptimizer by @percevalw in #383
- feat: compute eds.ner_crf loss as mean over words by @percevalw in #384
- Fix issue 372: resulting tuning config file now preserve comments by @LucasDedieu in #373
- Feat: add checkpoint management for tuning by @LucasDedieu in #385
- feat: add ner confidence score by @LucasDedieu in #387
- chore: bump version to 0.16.0 by @LucasDedieu in #393
New Contributors
- @LucasDedieu made their first contribution in #361
Full Changelog: v0.15.0...v0.16.0