Add a GPT-2 training example

We would like to use these issues to gauge user interest.

It is possible to use the GPT-2 implementation for further language model training. There is no example demonstrating this on the repo or otherwise.

To make this possible on a typical consumer GPU will likely require some technique to reduce the amount of GPU memory required to train. There are a number of options:

1. Add support for a smaller GPT-2 model.
2. Only train a subset of the GPT-2 parameters.
3. Use gradient accumulation.
4. Gradient checkpointing.
5. Reduced precision gradients.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a GPT-2 training example #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a GPT-2 training example #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions