-
Notifications
You must be signed in to change notification settings - Fork 64
Add a GPT-2 training example #19
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
We would like to use these issues to gauge user interest.
It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.
To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:
- Add support for a smaller GPT-2 model.
- Only train a subset of the GPT-2 parameters.
- Use gradient accumulation.
- Gradient checkpointing.
- Reduced precision gradients.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request