Add AArch64 SIMD for Blake, SHA, CRC, XXH3, Argon2, and Adler32 by Xor-el · Pull Request #88 · Xor-el/HashLib4Pascal

Xor-el · 2026-06-24T23:49:56Z

Summary

Adds runtime-dispatched AArch64 SIMD implementations across HashLib4Pascal, bringing ARM64 to parity with the existing x86 SIMD tier ladder. Kernels use inline assembly with .long-encoded vector/crypto instructions for broad FPC assembler compatibility, and are selected at startup via the existing *Dispatch.pas + HlpArmSimdFeatures infrastructure.

This PR also refactors the CRC fold core (unified runtime context, clearer include naming).

What's new on AArch64

Crypto Extensions (FEAT_SHA*)

Algorithm	Dispatch probe	Kernel
SHA-1	`HasSHA1()`	`SHA1CompressCryptoExt_aarch64.inc`
SHA-256	`HasSHA256()`	`SHA256CompressCryptoExt_aarch64.inc`
SHA-512	`HasSHA512()`	`SHA512CompressCryptoExt_aarch64.inc`
SHA-3 (Keccak-f[1600])	`HasSHA3()`	`KeccakF1600CryptoExt_aarch64.inc` + absorb variant
CRC fold	`HasPMULL()`	`CRCFoldForwardPmull_aarch64.inc`, `CRCFoldReflectedPmull_aarch64.inc`

NEON (Advanced SIMD)

Algorithm	Dispatch	Kernel
BLAKE2b / BLAKE2s	`SelectSlot([NEON])`	`Blake2BCompressNeon_aarch64.inc`, `Blake2SCompressNeon_aarch64.inc`
BLAKE3	`SelectSlot([NEON])`	`Blake3CompressNeon_aarch64.inc`, `Blake3Hash4Neon_aarch64.inc`
Adler-32	`SelectSlot([NEON])`	`Adler32BlocksNeon_aarch64.inc`
XXH3	`SelectSlot([NEON])`	`XXH3Acc512Neon_aarch64.inc`, `XXH3InitSecretNeon_aarch64.inc`, `XXH3ScrambleNeon_aarch64.inc`
Argon2	`SelectSlot([NEON])`	`Argon2FillBlockNeon_aarch64.inc`

Scrypt (intentional scalar default)

A verified ScryptSalsaXor_Neon kernel is included, but dispatch keeps the scalar path on AArch64. Benchmarks on Apple Silicon show scalar wins at every tested N because Scrypt's serial Salsa20/8 chain does not benefit from lane parallelism, while AArch64's 31 GPRs let the scalar kernel avoid spills. This matches upstream practice (OpenSSL/libsodium ship x86 SSE2 Scrypt but no NEON variant).

Infrastructure changes

HlpArmSimdFeatures: probes SHA-512, SHA-3, and PMULL; adds DisableAllExtraFeatures() for uniform HASHLIB_FORCE_* override baselines
AArch64 asm prologues: new SimdProc1Begin_aarch64.inc … SimdProc6Begin_aarch64.inc under Include/Simd/Common/
Dispatch documentation: standardized SIMD index blocks in all *Dispatch.pas units; kernel header conventions documented in HashLib.Tests/docs/SimdDispatch.md and SimdAarch64Headers.md
Package wiring: FPC/Delphi packages updated for renamed/consolidated CRC units

CRC core refactor (#86, #87)

Introduces unified TCRCFoldRuntimeCtx (fold constants + slicing table rows in one packed record)
Renames fold include files for consistency (CRCFoldForwardPclmul_x86_64.inc, etc.)
Renames HlpGF2.pas → HlpCRCFoldConstants.pas
Consolidates width-specific CRC wrappers into HlpCRCStandard.pas (replaces separate HlpCRC16/32/64.pas units)
Registers PMULL carry-less multiply fold when HasPMULL() is true (analogous to x86 PCLMUL/VPCLMUL chain)

Other changes

CI: add benchmark support to ci workflow
Benchmark: renames benchmark project/source files to HashLib.Benchmark* convention (no functional change)

Architecture

flowchart TD
  Init["InitDispatch at unit load"] --> Scalar["Assign scalar fallback"]
  Scalar --> ArmProbe{"AArch64?"}
  ArmProbe -->|CryptoExt| SHA["HasSHA* / HasSHA3 probes"]
  ArmProbe -->|NEON tier| NEON["SelectSlot NEON"]
  ArmProbe -->|PMULL| CRC["HasPMULL for CRC fold"]
  ArmProbe -->|Scrypt skip| ScryptScalar["Keep Scrypt_SalsaXor_Scalar"]
  SHA --> Active["Active proc pointer"]
  NEON --> Active
  CRC --> Active
  ScryptScalar --> Active

- unify TCRCFoldRuntimeCtx32/64 into one TCRCFoldRuntimeCtx (untyped TableRow, same +96 asm layout); collapse the two init builders into one overload - rename fold fns/globals to ISA-neutral names (Lsb->Reflected, Msb->Forward, CRC_Fold_UsesPclmul->CRC_Fold_UsesCarrylessMul) and the 10 CRCFold*.inc files - factor scalar reflected slice into CRC_FoldReflected_OneSlice - rename HlpGF2 -> HlpCRCFoldConstants; merge HlpCRC16/32/64 into HlpCRCStandard (HlpCRC32Fast kept separate) - fix stale .inc header refs (e.g. TConverters.le2me_32) public API (THashFactory.TChecksum.TCRC, TCRCStandard, ICRC) unchanged

Xor-el added 30 commits June 21, 2026 02:13

sha1 and sha256 initial commit

8ba1aed

Add AArch64 SHA-1/256 CryptoExt inline asm with q/.long encoding

23110fd

encode ldp and ldr

8a9c681

load K256 via ldr + .quad pool.

69510f6

some wrappings

7b9bff8

another

979b734

another one

9fad651

test

0821918

another attempt

a1e5e8b

test

97391c2

another one

b258efb

another

96c52cb

test

7e9f8a0

another attempt

325df9e

rework aarch64 sha1 and sha256 cryptoext inc files

bb67423

add sha512 aarch64 cryptoext implementation

16a7d43

add sha3 aarch64 cryptoext implementation

cc7355d

add DisableAllExtraFeatures in TArmSimdFeatures

91da3bb

Add and fix AArch64 NEON implementation for Blake2 and Blake3 (#81)

ac3e2f5

Add AArch64 NEON implementation for XXH3 (#82)

c698884

Add AArch64 NEON implementation for Adler32 (#83)

910d7df

Add AArch64 NEON implementation for Scrypt

6cc8f8a

comment force neon

cbb2a3f

temporary switch to scalar for scrypt

054f9e4

re-enable scrypt neon

528f4ee

Add AArch64 NEON implementation for Argon2 (#84)

085dae9

Fix Aarch64 NEON performance and update Scrypt handling (#85)

9a44cf8

update dispatch files comments

0b63d58

unify aarch64 inc files headers

bcab5df

Xor-el added 4 commits June 25, 2026 00:04

Add AArch64 PMULL implementation for CRC (#87)

3a83b55

re-enable all ci matrixes

3a5d94a

update asm mode for Arm

e6e40aa

update inc file

dce8ecd

Xor-el merged commit 4d2cf6d into master Jun 25, 2026
24 checks passed

Xor-el deleted the feature/arm-simd branch June 25, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AArch64 SIMD for Blake, SHA, CRC, XXH3, Argon2, and Adler32#88

Add AArch64 SIMD for Blake, SHA, CRC, XXH3, Argon2, and Adler32#88
Xor-el merged 34 commits into
masterfrom
feature/arm-simd

Xor-el commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Xor-el commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new on AArch64

Crypto Extensions (FEAT_SHA*)

NEON (Advanced SIMD)

Scrypt (intentional scalar default)

Infrastructure changes

CRC core refactor (#86, #87)

Other changes

Architecture

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Xor-el commented Jun 24, 2026 •

edited

Loading