Skip to content

training loss not decreased #307

@desktoop

Description

@desktoop

This is my training loss (broken at epoch [110/200]):

Image

I found the "epoch" here is not generally used in deep learning. It seems the model weights are reset in each saving epoch (or not loading the latest?). Did I set the wrong parameters?

The number of whole tokens in my dataset is train-num-samples * epochs, which is 49597346 * 200 = 9,919,469,200. I want the whole tokens can be seen by model only once.

dataset: fineweb-edu-10BT
tokenization (from DCLM, not open lm):

--input /path/to/fineweb-10BT \
--local-cell-dir tmp/path/to/storage/for/local/cells \
--output path/to/tokenization \
--tokenizer "EleutherAI/gpt-neox-20b" \
--seqlen 2049 \
--wds-chunk-size 8192 \
--num-local-cells 512

training:

 --model open_lm_411m_v2 \
 --train-data /my/tokens/path/shard_{00000000..00000064}.tar \
 --train-num-samples 49597346 \
 --workers 8 \
 --dataset-resampled \
 --precision amp_bfloat16 \
 --grad-checkpointing \
 --log-every-n-steps 10 \
 --global-batch-size 64 \
 --epochs 200 \
 --grad-clip-norm 1 \
 --data-key json.gz \
 --lr 3e-4 \
 --fsdp --fsdp-amp \
 --warmup 2000 \
 --wd 0.1 \
 --beta2 0.95 \
 --report-to wandb \
 --name open_lm_ex_$RANDOM \
 --logs /mnt/nas/copora-evaluation/public-model/checkpoint/fineweb-10BT/checkpoint \
 --resume latest

device info: 4*A40

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions