Skip to content

How to preserve CUDA kernels in Polygeist GPU IR (polygeist.gpu_wrapper) instead of lowering to scf? #442

@yuxuan-z19

Description

@yuxuan-z19

Hi, I’m trying to use Polygeist to extract polyhedral structure from CUDA kernels. According to the Polygeist-GPU paper (CGO’24), CUDA code can be lifted into GPU-level Polygeist IR such as:

Image

However, when I compile CUDA kernels with cgeist, I only get MLIR scf IR (scf.if, scf.execute_region, scf.for, etc.) and the GPU parallel structure is not preserved.

Command:

cgeist --immediate \
    -cuda-path=/usr/local/cuda \
    --cuda-gpu-arch=sm_89 \
    -I/usr/local/cuda/include \
    -I/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18/include \
    --resource-dir=/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18 \
    -S \
    test.cu > test.mlir

Example kernels and the MLIR output

__global__ void kernel_C(int m, int n, double alpha, double beta, double* C,
                         double* A, double* B, double* tmp) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    int j = blockDim.y * blockIdx.y + threadIdx.y;

    if (i < m && j < n)
        C[i * n + j] = beta * C[i * n + j] +
                       alpha * B[i * n + j] * A[i * n + i] +
                       alpha * tmp[i * n + j];
}

__global__ void kernel_sum(int m, int n, double alpha, double beta, double* C,
                           double* A, double* B, double* tmp) {
    int k = blockDim.x * blockIdx.x + threadIdx.x;
    int j = blockDim.y * blockIdx.y + threadIdx.y;

    if (k < m - 1 && j < n) {
        for (int i = k + 1; i < m; i++)
            C[k * n + j] += alpha * B[i * n + j] * A[i * n + k];
    }
}

I also tested various available cgeist options, but none retained the for-loop structure for further analysis.

Questions:

  1. Is there a flag or pipeline that prevents lowering CUDA kernels directly into scf and keeps them in Polygeist’s GPU IR?
  2. If this path is not supported anymore, is the GPU-wrapper lowering planned for reintroduction?
  3. Or should I implement a custom pass that intercepts the lowering before it becomes scf?

Any guidance would be very helpful. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions