Skip to content
This repository was archived by the owner on Dec 16, 2022. It is now read-only.

add gradient checkpointing for transformer token embedders #4544

Merged

Conversation

ArneBinder
Copy link
Contributor

@ArneBinder ArneBinder commented Aug 7, 2020

Gradient checkpointing has been released with huggingface/transformers v3.0.0. Gradient checkpointing reduces memory by 5x which makes it possible to process longer sequences on smaller GPUs. This PR allows to fine-tune transformer token embedders on a more regular GPU with high(er) max_length values.

Sorry for not providing any tests, I have no clue how to create a simple test case and no resources at the moment to figure that out. But I tried it succesfully with Longformer (max_length: 4096) and SciBert (max_length: 512) on a single RTX 2080 Ti.

@AkshitaB AkshitaB requested a review from dirkgr August 7, 2020 16:20
@dirkgr
Copy link
Member

dirkgr commented Aug 10, 2020

Awesome. I just noticed that the support for checkpointing in Huggingface came originally from AI2's own @ibeltagy. Thanks for driving that!

@dirkgr dirkgr merged commit b32608e into allenai:master Aug 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants