Skip to content

Commit 404980e

Browse files
authored
Merge pull request #290 from microsoft/gpu-readme-dev
Update readme for gpu kernels
2 parents 088e607 + c1e9a9a commit 404980e

File tree

1 file changed

+19
-5
lines changed

1 file changed

+19
-5
lines changed

gpu/README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,9 @@ It significantly improves GEMV throughput when processing quantized weights and
7373

7474
## Performance
7575

76-
Kernel performance (tested on NVIDIA A100 40GB GPU):
76+
### Kernel Benchmarks
77+
78+
Tested on NVIDIA A100 40GB GPU, our custom W2A8 kernel shows significant speedups over standard BF16 implementations:
7779

7880
| Shape (N×K) | W2A8 Latency (us) | BF16 Latency (us) | Speedup Ratio |
7981
|---------------------|-------------------|-------------------|----------------------|
@@ -86,8 +88,20 @@ Kernel performance (tested on NVIDIA A100 40GB GPU):
8688
| 3200 × 10240 | 19.64 | 60.79 | 3.10 |
8789
| 20480 × 3200 | 30.99 | 112.39 | 3.63 |
8890

89-
Generation throughput:
91+
### End-to-End Generation Latency
92+
93+
Compared to a similarly-sized BF16 model (Gemma-2-2B using vLLM), BitNet-b1.58-2B with our kernel achieves consistent speedups across workloads:
94+
95+
| Input Length | Output Length | BF16 Latency (ms) | W2A8 Latency (ms) | Speedup Ratio |
96+
| --- | --- | --- | --- | --- |
97+
| 64 | 16 | 187.64 | 57.40 | 3.27 |
98+
| 64 | 32 | 353.50 | 112.22 | 3.15 |
99+
| 64 | 64 | 683.23 | 221.08 | 3.09 |
100+
| 256 | 16 | 183.14 | 61.24 | 2.99 |
101+
| 256 | 32 | 353.14 | 115.47 | 3.06 |
102+
| 256 | 64 | 684.24 | 224.16 | 3.05 |
103+
| 512 | 16 | 208.99 | 68.06 | 3.07 |
104+
| 512 | 32 | 354.33 | 122.72 | 2.89 |
105+
| 512 | 64 | 709.65 | 231.82 | 3.06 |
90106

91-
| BF16 (tokens/s) | W2A8 (tokens/s) | Speedup Ratio |
92-
|---|---|---|
93-
| 10.9 | 213.3 | 19.6 |
107+
*Note: Comparison uses equivalent-sized models (2B parameters) on NVIDIA A100 40GB GPU.*

0 commit comments

Comments
 (0)