v0.10.0

percevalw released this 04 Dec 17:10

· 300 commits to master since this release

c65030a

Changelog

Added

New add unified edsnlp.data api (json, brat, spark, pandas) and LazyCollection object
to efficiently read / write data from / to different formats & sources.
New unified processing API to select the execution execution backends via data.set_processing(...)
The training scripts can now use data from multiple concatenated adapters
Support quantized transformers (compatible with multiprocessing as well !)

Changed

edsnlp.pipelines has been renamed to edsnlp.pipes, but the old name is still available for backward compatibility
Pipes (in edsnlp/pipes) are now lazily loaded, which should improve the loading time of the library.
to_disk methods can now return a config to override the initial config of the pipeline (e.g., to load a transformer directly from the path storing its fine-tuned weights)
The eds.tokenizer tokenizer has been added to entry points, making it accessible from the outside
Deprecate old connectors (e.g. BratDataConnector) in favor of the new edsnlp.data API
Deprecate old pipe wrapper in favor of the new processing API

Fixed

Support for pydantic v2
Support for python 3.11 (not ci-tested yet)

Pull Requests

Fix matcher assigns by @percevalw in #222
Refactor to use Pytorch for training models by @percevalw in #202
Relieve dependency constraints by @percevalw in #227

Full Changelog: v0.9.1...v0.10.0

Contributors

percevalw

Assets 2