Skip to content

Commit 6220a22

Browse files
authored
Merge pull request #11 from felix-yuxiang/master
"cancel" package in mathjax is buggy, avoid using it in the future.
2 parents 61877a4 + d66bdcf commit 6220a22

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

_posts/2025-08-18-diff-distill.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -161,15 +161,22 @@ Based on the MeanFlow identity, we can compute the target as follows:
161161

162162
$$
163163
\require{physics}
164-
\require{cancel}
165164
\begin{align*}
166165
F_{t\to s}^{\text{tgt}}(\mathbf{x}_t, t, s\vert\mathbf{x}_0) &= \dv{\mathbf{x}_t}{t} - (t-s)\dv{F_{t\to s}(\mathbf{x}_t, t, s)}{t} \\
167-
& = \dv{\mathbf{x}_t}{t} - (t-s)\left(\nabla_{\mathbf{x}_t} F_{t\to s}(\mathbf{x}_t, t, s) \dv{\mathbf{x}_t}{t} + \partial_t F_{t\to s}(\mathbf{x}_t, t, s) + \cancel{\partial_s F_{t\to s}(\mathbf{x}_t, t, s) \dv{s}{t}}\right) \\
166+
& = \dv{\mathbf{x}_t}{t} - (t-s)\left(\nabla_{\mathbf{x}_t} F_{t\to s}(\mathbf{x}_t, t, s) \dv{\mathbf{x}_t}{t} + \partial_t F_{t\to s}(\mathbf{x}_t, t, s) + \underbrace{\partial_s F_{t\to s}(\mathbf{x}_t, t, s) \dv{s}{t}}_{=0}\right) \\
168167
& = v - (t-s)\left(v \nabla_{\mathbf{x}_t} F_{t\to s}(\mathbf{x}_t, t, s) + \partial_t F_{t\to s}(\mathbf{x}_t, t, s)\right). \\
169168
\end{align*}
170169
$$
171170

172-
Note that in MeanFlow $$\dv{\mathbf{x}_t}{t} = v(\mathbf{x}_t, t\vert \mathbf{x}_0)$$ and $$\dv{s}{t}=0$$ since $s$ is independent of $t$.
171+
Note that in MeanFlow
172+
173+
$$\require{physics}\dv{\mathbf{x}_t}{t} = v(\mathbf{x}_t, t\vert \mathbf{x}_0)$$
174+
175+
and
176+
177+
$$\require{physics}\dv{s}{t}=0$$
178+
179+
since $s$ is independent of $t$.
173180
</details>
174181

175182
In practice, the total derivative of $$F_{t\to s}(\mathbf{x}_t, t, s)$$ and the evaluation can be done in a single function call: `f, dfdt=jvp(f_theta, (xt, s, t), (v, 0, 1))`. Despite `jvp` operation only introduces one extra backward pass, it still incurs instability and slows down training. Moreover, the `jvp` operation is currently incompatible with the latest attention architecture. SplitMeanFlow<d-cite key="guo2025splitmeanflow"></d-cite> circumvents this issue by enforcing another consistency identity $$(t-s)F_{t\to s} = (t-r)F_{t\to r}+(r-s)F_{r\to s}$$ where $$s<r<t$$. This implies a discretized version of the MeanFlow objective which falls into loss type (c).

0 commit comments

Comments
 (0)