Skip to content

Commit b6d9801

Browse files
committed
Simple GeluAndMul kernels example
Add Hugging Face kernel hub example using kernels-community/activation GeluAndMul kernel. Signed-off-by: Steven Royer <sroyer@redhat.com>
1 parent 66cae79 commit b6d9801

3 files changed

Lines changed: 147 additions & 0 deletions

File tree

Containerfile

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
FROM nvcr.io/nvidia/cuda:13.0.3-cudnn-devel-ubi9
2+
3+
4+
RUN dnf update -y && \
5+
dnf install -y python3.12 python3.12-pip python3.12-devel vim
6+
7+
RUN pip3.12 install --upgrade pip
8+
RUN pip3.12 install uv
9+
10+
WORKDIR /src
11+
12+
# Create virtual env
13+
RUN uv venv venv --python 3.12
14+
ENV PATH="/src/venv/bin:/src/venv/lib64/python3.12/site-packages/nvidia/cu13/bin:$PATH"
15+
ENV VIRTUAL_ENV=/src/venv
16+
ENV UV_LINK_MODE=copy
17+
18+
RUN --mount=type=cache,target=/root/.cache/uv \
19+
uv pip install torch==2.11 torchvision kernels
20+
21+
COPY gelu-and-mul-test.py /src/
22+

README.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Hugging Face kernel hub example
2+
3+
This shows an example using the Hugging Face kernel hub to download a
4+
pre-compiled kernel. In this example, the
5+
[activation](https://huggingface.co/kernels/kernels-community/activation)
6+
gelu_and_mul kernel is demonstrated. At the time of this writing, this set
7+
of kernels has builds for Nvidia CUDA and Apple Metal. For the exact set of
8+
supported hardware, check the kernel card.
9+
10+
## Build
11+
12+
There is a Containerfile that encapsulates the environment needed to run a
13+
torch application that uses the kernels interface on Nvidia GPUs.
14+
15+
```bash
16+
podman build . -t gelu:latest
17+
```
18+
19+
You can also see what the Containerfile does and recreate it locally in a
20+
python virtual environment if you prefer not to use containers, for example
21+
if you want to try it on Apple Metal.
22+
23+
## Run
24+
25+
Set the HF_TOKEN environment variable to your Hugging Face token. Then:
26+
27+
```bash
28+
podman run -it --rm --device nvidia.com/gpu=all --security-opt=label=disable -e HF_TOKEN=${HF_TOKEN} gelu:latest python3.12 gelu-and-mul-test.py
29+
```
30+
31+
The output should look something like this if you have a supported Nvidia GPU
32+
and >=580 driver:
33+
34+
```bash
35+
==========
36+
== CUDA ==
37+
==========
38+
39+
CUDA Version 13.0.3
40+
41+
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
42+
43+
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
44+
By pulling and using the container, you accept the terms and conditions of this license:
45+
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
46+
47+
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
48+
49+
Fetching 6 files: 100%|███████████████████████████████████████████████████████████████| 6/6 [00:01<00:00, 3.99it/s]
50+
Download complete: : 4.18MB [00:01, 3.30MB/s] Success! | 2/6 [00:01<00:03, 1.17it/s]
51+
Download complete: : 4.18MB [00:01, 2.62MB/s]
52+
```
53+
54+
Note that the kernels package will output progress information about the
55+
download to stderr. You can filter that out. For example, you can
56+
redirect the stderr to /dev/null if you want cleaner output. Just be aware
57+
that debugging failures will be harder that way.
58+
59+
```bash
60+
$ podman run -it --rm --device nvidia.com/gpu=all --security-opt=label=disable -e HF_TOKEN=${HF_TOKEN} gelu:latest bash
61+
62+
==========
63+
== CUDA ==
64+
==========
65+
66+
CUDA Version 13.0.3
67+
68+
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
69+
70+
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
71+
By pulling and using the container, you accept the terms and conditions of this license:
72+
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
73+
74+
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
75+
76+
[root@fc9f150b0e9e src]# python3.12 gelu-and-mul-test.py 2> /dev/null
77+
Success!
78+
```

gelu-and-mul-test.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
import torch
2+
import torch.nn as nn
3+
import torch.nn.functional as F
4+
5+
from kernels import use_kernel_forward_from_hub
6+
from kernels import use_kernel_mapping, LayerRepository
7+
from kernels import Mode, kernelize
8+
9+
# Define the hub kernel to use for this test on cuda devices
10+
kernel_layer_mapping = {
11+
"GeluAndMul": {
12+
"cuda": LayerRepository(
13+
repo_id="kernels-community/activation",
14+
layer_name="GeluAndMul",
15+
version=1,
16+
)
17+
}
18+
}
19+
20+
# Implement the torch fallback method and request to use a hub kernel if available
21+
@use_kernel_forward_from_hub("GeluAndMul")
22+
class GeluAndMul(nn.Module):
23+
"""Implementation from https://github.com/vllm-project/vllm
24+
vllm/model_executor/layers/activation.py
25+
"""
26+
def forward(self, x: torch.Tensor) -> torch.Tensor:
27+
d = x.shape[-1] // 2
28+
return F.gelu(x[..., :d], approximate="none") * x[..., d:]
29+
30+
# Run the pure torch method first so we can compare the hub kernel
31+
x = torch.randn(32, 512, device="cuda", dtype=torch.bfloat16)
32+
model = GeluAndMul()
33+
torch_out = model(x)
34+
hub_out = None
35+
36+
# Run the hub optimized kernel now
37+
with use_kernel_mapping(kernel_layer_mapping):
38+
# Tell kernels that we want to do inference and enable torch.compile
39+
model = kernelize(model, device="cuda", mode=Mode.INFERENCE | Mode.TORCH_COMPILE)
40+
41+
hub_out = model(x)
42+
43+
# Make sure the hub optimized kernel gives the same output
44+
if torch.allclose(hub_out, torch_out, atol=1e-3, rtol=1e-3):
45+
print("Success!")
46+
else:
47+
print("Failed...")

0 commit comments

Comments
 (0)