-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Description
This is my training loss (broken at epoch [110/200]):
I found the "epoch" here is not generally used in deep learning. It seems the model weights are reset in each saving epoch (or not loading the latest?). Did I set the wrong parameters?
The number of whole tokens in my dataset is train-num-samples * epochs, which is 49597346 * 200 = 9,919,469,200. I want the whole tokens can be seen by model only once.
dataset: fineweb-edu-10BT
tokenization (from DCLM, not open lm):
--input /path/to/fineweb-10BT \
--local-cell-dir tmp/path/to/storage/for/local/cells \
--output path/to/tokenization \
--tokenizer "EleutherAI/gpt-neox-20b" \
--seqlen 2049 \
--wds-chunk-size 8192 \
--num-local-cells 512
training:
--model open_lm_411m_v2 \
--train-data /my/tokens/path/shard_{00000000..00000064}.tar \
--train-num-samples 49597346 \
--workers 8 \
--dataset-resampled \
--precision amp_bfloat16 \
--grad-checkpointing \
--log-every-n-steps 10 \
--global-batch-size 64 \
--epochs 200 \
--grad-clip-norm 1 \
--data-key json.gz \
--lr 3e-4 \
--fsdp --fsdp-amp \
--warmup 2000 \
--wd 0.1 \
--beta2 0.95 \
--report-to wandb \
--name open_lm_ex_$RANDOM \
--logs /mnt/nas/copora-evaluation/public-model/checkpoint/fineweb-10BT/checkpoint \
--resume latest
device info: 4*A40
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels