We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
2 parents 4312058 + 5d5d6bd commit 24eb7c8Copy full SHA for 24eb7c8
docs/homeworks/hw4.md
@@ -359,7 +359,7 @@ Word to embed: “hello”
359
## `nn.Embedding` Simplification
360
361
If we look closely at the matrix multiplication, we can notice that for each token,
362
-the “multiplication” is just choosing the column in W corresponding to that token!
+the “multiplication” is just choosing the row in W corresponding to that token!
363
364
So, this “linear transformation” is just a lookup table, where we have
365
$$V$$ vectors (V being the vocab size), and we look up the vector for each token and pile them together in a $$T \times C$$ matrix.
0 commit comments