How to preserve CUDA kernels in Polygeist GPU IR (polygeist.gpu_wrapper) instead of lowering to scf?

Hi, I’m trying to use Polygeist to extract polyhedral structure from CUDA kernels. According to the Polygeist-GPU paper (CGO’24), CUDA code can be lifted into GPU-level Polygeist IR such as:

<img width="737" height="846" alt="Image" src="https://github.com/user-attachments/assets/88ade6b0-feb4-406c-aed8-3875b97e7495" />

However, when I compile CUDA kernels with cgeist, I only get MLIR scf IR (`scf.if`, `scf.execute_region`, `scf.for`, etc.) and the GPU parallel structure is not preserved.

Command:

```bash
cgeist --immediate \
    -cuda-path=/usr/local/cuda \
    --cuda-gpu-arch=sm_89 \
    -I/usr/local/cuda/include \
    -I/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18/include \
    --resource-dir=/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18 \
    -S \
    test.cu > test.mlir
```

Example kernels and the [MLIR output](https://github.com/user-attachments/files/24045257/test.mlir.txt)

```cpp
__global__ void kernel_C(int m, int n, double alpha, double beta, double* C,
                         double* A, double* B, double* tmp) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    int j = blockDim.y * blockIdx.y + threadIdx.y;

    if (i < m && j < n)
        C[i * n + j] = beta * C[i * n + j] +
                       alpha * B[i * n + j] * A[i * n + i] +
                       alpha * tmp[i * n + j];
}

__global__ void kernel_sum(int m, int n, double alpha, double beta, double* C,
                           double* A, double* B, double* tmp) {
    int k = blockDim.x * blockIdx.x + threadIdx.x;
    int j = blockDim.y * blockIdx.y + threadIdx.y;

    if (k < m - 1 && j < n) {
        for (int i = k + 1; i < m; i++)
            C[k * n + j] += alpha * B[i * n + j] * A[i * n + k];
    }
}
```

I also tested various [available cgeist options](https://github.com/user-attachments/files/24045190/cgeist_options.md), but none retained the for-loop structure for further analysis.

Questions:

1. Is there a flag or pipeline that prevents lowering CUDA kernels directly into `scf` and keeps them in Polygeist’s GPU IR?
2. If this path is not supported anymore, is the GPU-wrapper lowering planned for reintroduction?
3. Or should I implement a custom pass that intercepts the lowering before it becomes scf?

Any guidance would be very helpful. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to preserve CUDA kernels in Polygeist GPU IR (polygeist.gpu_wrapper) instead of lowering to scf? #442

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to preserve CUDA kernels in Polygeist GPU IR (polygeist.gpu_wrapper) instead of lowering to scf? #442

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions