Skip to content

Martien.physreg liveranges#747

Open
martien-de-jong wants to merge 17 commits intoaie-publicfrom
martien.physreg-liveranges
Open

Martien.physreg liveranges#747
martien-de-jong wants to merge 17 commits intoaie-publicfrom
martien.physreg-liveranges

Conversation

@martien-de-jong
Copy link
Collaborator

@martien-de-jong martien-de-jong commented Dec 31, 2025

This is a POC of register allocation during postpipelining.

We add

  • RegDefUseTracker, taking care of virtualizing safe live ranges,
  • SchedulingInterpreter, computing register live ranges based on scheduled pipeline timing
  • PostRegAlloc, using the two to allocate the virtual registers after postpipelining.
  • Scarce range scheduling, a postpipelining scheduler heuristic avoiding overlap between liveranges that compete for a single register

Status:
It's aggressive enough to reach II=7 on gemm-bfp16-opt0, but sadly, the code it produces is not correct. I'm trying to find out what is causing my diff failure.

@martien-de-jong martien-de-jong force-pushed the martien.physreg-liveranges branch 2 times, most recently from cc8fca8 to 33264be Compare January 8, 2026 10:48
@martien-de-jong martien-de-jong marked this pull request as ready for review January 8, 2026 11:13
/// register. There may be multiple current definitions for a register with
/// disjunct lanemasks.
VReg2SUnitMultiMap CurrentVRegDefs;
VReg2SUnitOperIdxMultiMap CurrentVRegDefs;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was asymmetric between Uses and Defs. We need the operand index of the outstanding defs to compute operand latencies.


// Use TRI's regsOverlap which handles both physical and virtual registers,
// including subregisters and lane masks
return TRI->regsOverlap(SrcReg, DstReg);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was only needed transiently, but it looks really good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice they work on RegUnits

PostSWP->isPostPipelineCandidate(*TheBlock))
staticallyMaterializeMultiSlotInstructions(*TheBlock, HR);
PostSWP->isPostPipelineCandidate(*TheBlock)) {
staticallyMaterializeMultiSlotInstructions(*TheBlock, HR, MaterializeAll);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have been nice to be able to skip the scheduler before postpipelining. Sadly, the scheduler sometimes makes better decisions.

for (int T = 0; T < II; ++T) {
LaneBitmask Mask = LanesByOffset[T];
if (Mask.any()) {
// Show a simple indicator - could be enhanced to show actual lanes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Full lanemasks are bulky though.

static cl::opt<bool> TestRegDefUseTracker(
"aie-test-regdefuse-tracker", cl::Hidden, cl::init(false),
cl::desc("[AIE] TEST MODE: Run RegDefUseTracker analysis on all loops "
"(for testing only)"));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accommodating a dump for the early stages of live range analysis.


void BlockState::restorePipelining() {
// Restore to the original allocation of the virtual registers
RegTracker->restoreOriginalPhysRegs();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These registers were used by the scheduler whose result we're going to use as a fallback.

@martien-de-jong martien-de-jong force-pushed the martien.physreg-liveranges branch from 7930abc to dcc908c Compare January 13, 2026 12:49
BS.FixPoint.PipelinerMode = firstPipelinerMode();
if (BS.FixPoint.PipelinerMode != PostPipelinerMode::None) {
return SchedulingStage::Pipelining;
}
Copy link
Collaborator Author

@martien-de-jong martien-de-jong Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit weird: we have been pipelining and are trying to restore to the first allowed pipelinermode for the next II. This should be invariant, so I don't think we can get None here. Perhaps assert.


// For virtual mode, re-analyze and virtualize
if (FixPoint.PipelinerMode == PostPipelinerMode::Virtual) {
// RegTracker might not exist if we have multiple regions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone missed that we can't do physical mode either if we have more than one region.
I would hope that RegTracker is always there for a SWP candidate.

Martien de Jong added 13 commits January 23, 2026 12:38
Also for virtual registers. For architectures where latencies can go
negative, this has impact on RecMII
The old check lines matched, ignoring the leading nops.
The FixPoint updaters just return the new state.
This is abstracting the live ranges to be used by PostRegAlloc
This module analyses live ranges of physical registers that can be
safely reallocated in a basic block.

It supplies facilities to rewrite to virtual registers and to restore
the original allocation.
This module produces an EventSchedule from the instructions and their
issue cycle. The event schedule contains the read and write events of
the virtual registers occuring in the instructions ordered in the processor
pipeline stage timeline. From the EventSchedule the modulo liveranges for a
particular II can be constructed. These represent the lanes of each register
that are live at a particular point.
This is a dedicated register allocator for use by the postpipeliner

We compute some metrics, and run with a few different scorefunctions
on those metrics to define an allocation order.
We allocate in that ordeer, and fail as soon as we can't find a
register that is available over the live range.
This is a strategy that prioritizes scheduling of scarce ranges.
Scarce ranges are live ranges that compete for one svailable register.

The live ranges are virtualized, which means we have no serializing
WAR deps. However, we need to be careful not to have more than one live,
which means we want to finish the range before starting a new one.

We try all legal permutations of these live ranges. For the current live range,
we first prioritize all its ancestors, then the instructions in the range
itself.
Once we are finished with the range, we simulate the WAR dependences that
are necessary to keep the next ranges non-overlapping
@martien-de-jong martien-de-jong force-pushed the martien.physreg-liveranges branch from dcc908c to 133f034 Compare January 30, 2026 12:03
@martien-de-jong martien-de-jong force-pushed the martien.physreg-liveranges branch from 133f034 to 699934a Compare January 30, 2026 12:48
const AIEHazardRecognizer &HR,
ResourceScoreboard<FuncUnitWrapper> &Scoreboard) {
const int Step = fromTop() ? 1 : -1;
if (Step < 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does fromTop change the Stepping direction? Also why is the First and Last in the wrong order?

; CHECK-NEXT: vshuffle x2, x8, x0, r16; vmac.f bml5, bml5, x9, x7, r2
; CHECK-NEXT: .L_LEnd0:
; CHECK-NEXT: vshuffle x10, x1, x3, r3; vmac.f bmh4, bmh4, x6, x5, r2
; CHECK-NEXT: nopb ; nopa ; nops ; nopx ; vshuffle x10, x1, x3, r3; vmac.f bmh4, bmh4, x6, x5, r2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, what induced this change?

auto *AltA = Formats->getAlternateInstsOpcode(A->getOpcode());
auto *AltB = Formats->getAlternateInstsOpcode(B->getOpcode());

return AltA->size() < AltB->size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check: less options, higher priority.

auto SlotToBanks = getAssignedSlots(MBB, TII, HR);

if (!assignSlots(SlotToBanks, MBB, TII, HR)) {
if (assignSlots(SlotToBanks, MBB, TII, HR)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment here explaining that we care about memory banks first.

}

const int Limit = Last + Step;
for (int C = First; C != Limit; C += Step) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation has some debug messages, are they not useful? Also, we could remove the original fit.

const AIEHazardRecognizer &HR,
ResourceScoreboard<FuncUnitWrapper> &Scoreboard) {
const int Step = fromTop() ? 1 : -1;
if (Step < 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have something like:

if (!fromTop()) {
  std::swap(First, Last);
}

It self documents the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find this whole buisness funny. Why exactly are we doing that?

/// Key identifying a live range and its subregister
struct LRKey {
unsigned LRId; // Live range identifier
unsigned SubRegIdx; // Subregister index (0 for full register)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK: is SubRegIdx sufficient for comparing registers of different regclasses like ex and x regs or ex and y regs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check could be more strict by checking if blocked regunits overlap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants