Skip to content

Questions about model scaling #8

@Went-Liang

Description

@Went-Liang

This paper is excellent! I have some questions to consult:

  1. Is the d_token mentioned in the paper the same as hidden_size?
  2. Section 3.3 of the paper mentions: The dimension d of the new parameters is the same as that of the old parameters, and the number of new and old learnable parameter tokens are m and n, respectively. When expanding the model, it is only necessary to concatenate them into a size of (m+n)d. In the code , hidden_size is used to assign value to dimension d. However, in the 150M_eval.yml and the 450M_eval.yml, the size of hidden_size changes. This suggests that when expanding from a smaller model to a larger one, it would not be possible to concatenate parameters along the dimension of the number of learnable parameter tokens as mentioned in section 3.3. This seems inconsistent with what is written in the paper. Could it be that I am misunderstanding this? Could you please explain how exactly you implemented the model scaling?

Looking forward to your reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions