-
Notifications
You must be signed in to change notification settings - Fork 156
Open
Description
Hi, I’m trying to use Polygeist to extract polyhedral structure from CUDA kernels. According to the Polygeist-GPU paper (CGO’24), CUDA code can be lifted into GPU-level Polygeist IR such as:
However, when I compile CUDA kernels with cgeist, I only get MLIR scf IR (scf.if, scf.execute_region, scf.for, etc.) and the GPU parallel structure is not preserved.
Command:
cgeist --immediate \
-cuda-path=/usr/local/cuda \
--cuda-gpu-arch=sm_89 \
-I/usr/local/cuda/include \
-I/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18/include \
--resource-dir=/data/zyx/local/Polygeist/llvm-project/build/lib/clang/18 \
-S \
test.cu > test.mlirExample kernels and the MLIR output
__global__ void kernel_C(int m, int n, double alpha, double beta, double* C,
double* A, double* B, double* tmp) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
if (i < m && j < n)
C[i * n + j] = beta * C[i * n + j] +
alpha * B[i * n + j] * A[i * n + i] +
alpha * tmp[i * n + j];
}
__global__ void kernel_sum(int m, int n, double alpha, double beta, double* C,
double* A, double* B, double* tmp) {
int k = blockDim.x * blockIdx.x + threadIdx.x;
int j = blockDim.y * blockIdx.y + threadIdx.y;
if (k < m - 1 && j < n) {
for (int i = k + 1; i < m; i++)
C[k * n + j] += alpha * B[i * n + j] * A[i * n + k];
}
}I also tested various available cgeist options, but none retained the for-loop structure for further analysis.
Questions:
- Is there a flag or pipeline that prevents lowering CUDA kernels directly into
scfand keeps them in Polygeist’s GPU IR? - If this path is not supported anymore, is the GPU-wrapper lowering planned for reintroduction?
- Or should I implement a custom pass that intercepts the lowering before it becomes scf?
Any guidance would be very helpful. Thanks!
Metadata
Metadata
Assignees
Labels
No labels