Release v0.16.0 · aphp/edsnlp

Changelog

Hyperparameter Tuning for EDS-NLP: introduced a new script edsnlp.tune for hyperparameter tuning using Optuna. This feature allows users to efficiently optimize model parameters with options for single-phase or two-phase tuning strategies. Includes support for parameter importance analysis, visualization, pruning, and automatic handling of GPU time budgets.
Provided a detailed tutorial on hyperparameter tuning, covering usage scenarios and configuration options.
ScheduledOptimizer (e.g., @core: "optimizer") now supports importing optimizers using their qualified name (e.g., optim: "torch.optim.Adam").
eds.ner_crf now computes confidence score on spans.

The loss of eds.ner_crf is now computed as the mean over the words instead of the sum. This change is compatible with multi-gpu training.
Having multiple stats keys matching a batching pattern now warns instead of raising an error.

Support packaging with poetry 2.0
Solve pickling issues with multiprocessing when pytorch is installed
Allow deep attributes like a.b.c for span_attributes in Standoff and OMOP doc2dict converters
Fixed various aspects of stream shuffling:
- Ensure the Parquet reader shuffles the data when shuffle=True
- Ensure we don't overwrite the RNG of the data reader when calling stream.shuffle() with no seed
- Raise an error if the batch size in stream.shuffle(batch_size=...) is not compatible with the stream
eds.split now keeps doc and span attributes in the sub-documents.

fix: support packaging with poetry 2.0 by @percevalw in #362
Solve pickling issues with multiprocessing when pytorch is installed by @percevalw in #367
Feat: add hyperparameters tuning by @LucasDedieu in #361
Fix issue 368: Add metric parameter and write optimal config.yml at the end of tuning. by @LucasDedieu in #369
Fix issue 370: two-phase tuning now write phase 1 frozen best values into phase 2 results_summary.txt by @LucasDedieu in #371
fix: allow deep attributes in Standoff and OMOP doc2dict converters by @percevalw in #381
fix: improve various aspect of stream shuffling by @percevalw in #380
fix: eds.split now keeps doc and span attributes in the sub-documents by @percevalw in #363
feat: allow importing optims using qualified names in ScheduledOptimizer by @percevalw in #383
feat: compute eds.ner_crf loss as mean over words by @percevalw in #384
Fix issue 372: resulting tuning config file now preserve comments by @LucasDedieu in #373
Feat: add checkpoint management for tuning by @LucasDedieu in #385
feat: add ner confidence score by @LucasDedieu in #387
chore: bump version to 0.16.0 by @LucasDedieu in #393

Full Changelog: v0.15.0...v0.16.0