I am trying to understand how grouped GEMM is represented in cuBLASLt.
My confusion is the following:
For grouped GEMM, each operation in the group should have its own matrix pointers A[i], B[i], and C[i].
But from the cublasLtMatmul function signature, A/B/C seem to be single pointers rather than pointer-to-pointer arguments.
Because of this, I am unsure which of the following is correct:
- The API expects pointer arrays for A/B/C.
If yes, are these arrays located in host memory or device memory?
- The API expects a single pointer, and grouped GEMM is expressed through offsets or strides.
- There is another grouped GEMM-specific mechanism in cuBLASLt that I may be missing.
I am trying to understand how grouped GEMM is represented in cuBLASLt.
My confusion is the following:
For grouped GEMM, each operation in the group should have its own matrix pointers A[i], B[i], and C[i].
But from the cublasLtMatmul function signature, A/B/C seem to be single pointers rather than pointer-to-pointer arguments.
Because of this, I am unsure which of the following is correct:
If yes, are these arrays located in host memory or device memory?