[AIEX] Improve prologue scheduling by using AIERegMemEventTracker by andcarminati · Pull Request #765 · Xilinx/llvm-aie

andcarminati · 2026-01-30T08:51:11Z

The goal is to reduce pessimism by precise register event tracking. Locks and events are handled in the same way as in the epilogue.

andcarminati · 2026-01-30T08:54:29Z

llvm/test/CodeGen/AIE/aie2p/elongate/zol_phdr_loopbundles_112bytes.mir

  ; CHECK-LABEL: zol_elongated_preheader_plus_NumOfLoopBundles_ge112bytes:
  ; CHECK:         .p2align 4
  ; CHECK-NEXT:  // %bb.0: // %entry
-  ; CHECK-NEXT:    vldb x6, [p0], #0


This is something relevant. The old approach detected a dependency through p0, but by using maxLatency, we extend the distance to 7 (load latency), what is quite pessimistic.

andcarminati · 2026-01-30T08:56:01Z

Next step is to refactor a bit the code.

llvm/test/CodeGen/AIE/aie2p/end-to-end/add-att-broadcasting-aa.ll

F-Stuckmann · 2026-01-30T14:40:18Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+                                        /*InSeparateRegion=*/false);
+
+      // Find the loop successor - must exist if we have bot-fixed bundles in a
+      // prologue


What is the difference between regular interblock scheduling and the pipelining interblock scheduling?

They are different in the sense that without pipelining:

We push a safety margin on the epilogue block based on what we see on the loop.

We initialize the bot scoreboard of the prologue based on the loop state.

With a pipelined loop, we have the top and bot insert regions that are already scheduled. We carefully dump those regions to a place where the normal DAG construction will not see them (sequence of bundles), because we need full control to not rearrange/change timing properties. After this timing region placement, we carefully create DAG nodes for the bundles (Also, normal DAG construction is not prepared for bundles- but this is not the reason), tying them together to a proper in-sequence scheduling. After, we construct a timeline of events relative the EntrySU and ExitSU, so we can create dependencies for the "free instructions". Register dependencies are the easy part on this timeline construction, the problem arises when we consider also memory cycles and alias relations (most of the code). As result, for each "free" instruction, we have a single edge to Entry/Exit modeling all memory and register latencies. For example, a memory instruction in the prologue can execute a register write inside de loop, and this must be also tracked - we do this here using events with negative cycles.

Also, this is the starting point for prologue/epilogue merging as a whole, the most complicated part is done by the scheduler: release each fixed node in sequence before any other node. This involves, for example, the switching the from top to bottom-um scheduling and delta cycles in both directions.

F-Stuckmann · 2026-01-30T14:41:21Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+      const BlockState &LoopBS =
+          Scheduler->getInterBlock().getBlockState(LoopSucc);
+      ArrayRef<AIE::MachineBundle> LoopTimedBundles = LoopBS.getTop().Bundles;
+      BackwardRET.computeUseDefBackward(LoopTimedBundles,


How do we handle the wrap around nature of the loops?

Once we execute the first iteration of the loop (all instructions will be there), this first iteration will be responsible for the first events in the timeline that may influence the prologue. Once we iterate again, events will not be interesting anymore because we already saw them previously, closer to the prologue. Happily, pipelined loops execute at least once and they contain all instructions.

F-Stuckmann · 2026-01-30T14:49:35Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

 }

+void AIERegMemEventTracker::updateLastStoreCycle(int StoreCycle) {
+  LastStoreCycle = std::max(LastStoreCycle, StoreCycle);


What is the difference between this and the scoreboard we use in the Postpipeliner? Can't we reuse it here?

A scoreboard can track resources (functional units/slots etc.) in the pipeline without the notion of latency. An instruction may reserve/require resources the the scoreboard, but there in no notion of operand latency/data dependency there. Here, we are not interested any resources, only latencies that can accommodate data dependencies and also memory relations between "free" instructions and already scheduled instructions.

mludevid · 2026-02-02T15:59:34Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

-          SDep Dep(FixedDepSU, SDep::Artificial);
-          Dep.setLatency(1);
-          FreeSU.addPred(Dep, /*Required=*/true);
+    // Handle bot-fixed instructions (prologue) with backward tracking


Looks really good. I think that we can make this function even more understandable by factoring out the intermediate steps into helper functions. Maybe it's even possible to share some code between the prologue and epilogue handling.

mludevid · 2026-02-02T16:02:24Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

-          FreeSU.addPred(Dep, /*Required=*/true);
+    // Handle bot-fixed instructions (prologue) with backward tracking
+    // Only handle prologues (preheader blocks), not epilogues here
+    if (!BotFixedBundles.empty() && BS.Kind != BlockType::Epilogue) {


I think this check is redundant. I suggest moving the block type check into an assert (as we do in the prologue case) or removing it altogether.

New checks available. We can have 1-stage pipelined loop with BotFixedBundles.

mludevid · 2026-02-02T16:04:07Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

    // SUnits get placed exactly at their depth (for the Top zone) or height
    // (for the Bot zone).
    SUnit *Pred = &DAG->EntrySU;
    // We itarate over BUNDLEs or standalone instructions.


Typo: iterate

mludevid · 2026-02-02T16:20:13Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

  int Cycle = 0;
  const int TotalCycles = Bundles.size();
+
+  // Track top-fixed region size (only for the first call, not separate region)


Nit: This comment explains the what and not the why. Could you add an explanation of why this tracking is only needed in the first call? Also, why not track it in both cases and move the check to the place where the total cycles is actually used, i.e. if the size is only needed in a computation in certain scenarios, then I think it makes sense to keep that check closer to the computation rather than here.

mludevid · 2026-02-02T16:28:09Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+
+  // Count progressively from 0 as we iterate backward through bundles
+  // Cycle represents the distance from ExitSU for each bundle
+  int Cycle = 0;


Nit: AFAIK we usually call this the height. Please rename for consistency.

Definitely not. AIERegMemEventTracker is a register and memory event tracker so we register events in their respective cycles.

mludevid · 2026-02-02T16:28:45Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+  int Cycle = 0;
+
+  for (const auto &Bundle : reverse(Bundles)) {
+    // Cycle is the distance from ExitSU for this bundle


Nit: Repeated comment.

mludevid · 2026-02-02T16:30:19Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+    if (getFirstMemoryAccessCycle() > INT_MIN) {
+      const int CoreResumeCycle = TII->getCoreResumeCycleAfterLock();
+      const int MIResumeCycle = CoreResumeCycle - 1;
+      const int MemDep = getFirstMemoryAccessCycle() - MIResumeCycle + 2;


I rewrote this part for clarity.

F-Stuckmann · 2026-02-03T10:55:38Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+                        [](const MachineOperand &MO) { return MO.isReg(); });
+  };
+
+  return MI.hasUnmodeledSideEffects() && !MI.mayLoadOrStore() &&


why do we treat unmodeled sideffects differently than locks or load/stores?

They are opaque instructions: no operands, no memory cycles, do not load or store and they are not locks (with respective associated cycles). In essence, they are scheduling barriers.

F-Stuckmann · 2026-02-03T11:17:37Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

-    }
+    // Check stores in top-fixed (RAW for loads, WAR for stores)
+    const int StoreCycle =
+        AA ? getMaxAliasingMemCycle(MI, /*IsStore=*/true) : getLastStoreCycle();


shouldn't this be getLastMemoryAccessCycle since we are unifying store and load?

This seems correct to me, we care about stores here.

andcarminati · 2026-02-04T12:58:04Z

QoR:

This helps to improve the outermost loop of GEMM inst8.

The goal is to reduce pessimism by precise register event tracking. Locks and events are handled in the same way as in the epilogue.

andcarminati requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, mludevid, niwinanto and stephenneuendorffer as code owners January 30, 2026 08:51

andcarminati marked this pull request as draft January 30, 2026 08:51

andcarminati commented Jan 30, 2026

View reviewed changes

llvm/test/CodeGen/AIE/aie2p/end-to-end/add-att-broadcasting-aa.ll Show resolved Hide resolved

andcarminati commented Jan 30, 2026

View reviewed changes

llvm/test/CodeGen/AIE/aie2p/end-to-end/add-att-broadcasting-aa.ll Show resolved Hide resolved

andcarminati force-pushed the andreu.prologue.improve branch from eced1ea to d3620c9 Compare January 30, 2026 13:05

F-Stuckmann reviewed Jan 30, 2026

View reviewed changes

mludevid reviewed Feb 2, 2026

View reviewed changes

F-Stuckmann reviewed Feb 3, 2026

View reviewed changes

andcarminati force-pushed the andreu.prologue.improve branch from d3620c9 to 9acbf08 Compare February 6, 2026 13:25

andcarminati added 2 commits February 6, 2026 07:06

[AIEX] Improve prologue scheduling by using AIERegMemEventTracker

10690c1

The goal is to reduce pessimism by precise register event tracking. Locks and events are handled in the same way as in the epilogue.

[AIEX] Better organization for RegionEndEdges code

097e353

andcarminati force-pushed the andreu.prologue.improve branch from 9acbf08 to 097e353 Compare February 6, 2026 14:10

Conversation

andcarminati commented Jan 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andcarminati commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andcarminati commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants