Skip to content

disable eltwise fusion to FC if connected to input_hidden_states#35720

Open
nazanin-beheshti wants to merge 3 commits into
openvinotoolkit:masterfrom
nazanin-beheshti:naz/phi-muffin-fusion
Open

disable eltwise fusion to FC if connected to input_hidden_states#35720
nazanin-beheshti wants to merge 3 commits into
openvinotoolkit:masterfrom
nazanin-beheshti:naz/phi-muffin-fusion

Conversation

@nazanin-beheshti
Copy link
Copy Markdown
Contributor

@nazanin-beheshti nazanin-beheshti commented May 7, 2026

Details:

  • Phi muffin app generates trash output on OV GPU stack.
    WO any changes, the output from OV GPU was garbage repeated characters. (attached)
  1. In my first experiment, I disable fusion completely in the OV GPU graph and I got non trash output, some meaningful words and characters however, the output was not correct.
  2. Then I try to disable fusion on the small red circle (loop) since that loop shows the first mismatch in the output.
    Disabling fusion in that small red loop did not resolve accuracy issue.
  3. Then I disabled fusion of the last FC + element wise in the big loop (red). That also did not resolve the accuracy issue, I still get meaningful words but not correct output.
  4. At the end, I disabled fusion of the first FC + element wise. The elementwise op is connected to the "input_hidden states". With that, I got meaningful and correct output.

By narrowing down the issue, we figured out that element wise connected to input_hidden_states fused to the fullyConnected is the source of trash output.
unfised-eltwise-input

fusion-mapping results

I also check the oneDNN kernel (fc + eltwise) with real sample inputs. The inputs are dumped buffer from OV GPU.
I use the dumper input buffers from OV GPU as input buffers to the oneDNN kernel.

buffer-imported

The kernel pass and no issue found with the kernel itself.
using -v100, the kernel output match with fc_dst dumped buffer output from OV GPU.

Tickets:

AI Assistance:

  • AI assistance used: no

@nazanin-beheshti nazanin-beheshti requested review from a team as code owners May 7, 2026 22:14
@github-actions github-actions Bot added the category: GPU OpenVINO GPU plugin label May 7, 2026
Copy link
Copy Markdown
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please provide tests?

@e-ddykim
Copy link
Copy Markdown
Contributor

e-ddykim commented May 8, 2026

The PR description looks confusing. If you still get incorrect results, we don't need this change.

@e-ddykim
Copy link
Copy Markdown
Contributor

e-ddykim commented May 12, 2026

image Does the left graph have an accuracy issue, while the right one is fine? As a first step, please check if the output from the FullyConnected on the left is different from the output from the Eltwise on the right. Next, it would be helpful to understand why the Reorder highlighted by the blue circle on the left no longer appears on the right.

@nazanin-beheshti
Copy link
Copy Markdown
Contributor Author

nazanin-beheshti commented May 12, 2026

image Does the left graph have an accuracy issue, while the right one is fine? As a first step, please check if the output from the FullyConnected on the left is different from the output from the Eltwise on the right. Next, it would be helpful to understand why the Reorder highlighted by the blue circle on the left no longer appears on the right.

Yes, the left graph has accuracy issue while the right one is fine.

  1. eltwise output from right graph vs FC output from left graph results in 0.97 cosine similarity (not matching)
  2. eltwise output from right graph vs reorder output from left graph results in 0.9994 cosine similarity.
reorder-vs-eltwise eltwise-vs-FC

@nazanin-beheshti
Copy link
Copy Markdown
Contributor Author

image Does the left graph have an accuracy issue, while the right one is fine? As a first step, please check if the output from the FullyConnected on the left is different from the output from the Eltwise on the right. Next, it would be helpful to understand why the Reorder highlighted by the blue circle on the left no longer appears on the right.

The reorder is actually coming from here:

input = std::make_shared<ov::op::v0::Convert>(normalize_l2->input_value(0), ov::element::f32);

I comment this line to avoid convert /reorder and check how compiled graph and final output change.
You can see the final graph with FC+eltwise fused and no reorder.
The output is still the trash repeated words and character, the same attached to the PR (wo-change)

fc-eltwise-fused-no-reorder

@e-ddykim
Copy link
Copy Markdown
Contributor

In that case, could you please check what happens if the leftmost connection is connected to the Reorder rather than the FullyConnected in the left graph?
image

@nazanin-beheshti
Copy link
Copy Markdown
Contributor Author

In that case, could you please check what happens if the leftmost connection is connected to the Reorder rather than the FullyConnected in the left graph? image
I did apply some changes in intel_gpu\src\plugin\transformations\normalize_l2_decomposition.cpp to apply convert not only from input to normalize_l2 but to all users.
With that reorder is added in (1), (2), (3) but in remove redundant reorder pass, that reorder inserted is removed.
With that, output is still trash repeated chars and words.

reorder-added (1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants