-
Notifications
You must be signed in to change notification settings - Fork 103
Open
Description
Description
I am learning the chronicles_prequel, and I find the last table in the chapter indicates the higher TFLOPS is achieved with Zero_Stage = 1.
Trying with ZeRO_STAGE=0/1
Zero_stage=1 could reduce the memory cost, but how come it increases the performance with other parameter being the same?
| Nodes | Size | ZS | DP | TP | PP | MBS | GBS | Mem | Sec/it | TFLOPs | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 48 | 181B | 1 | 4 | 8 | 12 | 2 | 2048 | 37GB | 120.29 | 134.02 | 02-21 |
| 48 | 181B | 0 | 4 | 8 | 12 | 2 | 2048 | 72GB | 137.34 | 113.02 | 02-21 |
Metadata
Metadata
Assignees
Labels
No labels