BENCHMARK_REPORT_OLD.md raw

Benchmark Comparison Report

Signer Implementation Comparison

This report compares three signer implementations for secp256k1 operations:

  1. P256K1Signer - This repository's new port from Bitcoin Core secp256k1 (pure Go)
  2. ~~BtcecSigner - Pure Go wrapper around btcec/v2~~ (removed)
  3. NextP256K Signer - CGO version using next.orly.dev/pkg/crypto/p256k (CGO bindings to libsecp256k1)

Generated: 2025-11-02 (Updated after comprehensive CPU optimizations) Platform: linux/amd64 CPU: AMD Ryzen 5 PRO 4650G with Radeon Graphics Go Version: go1.25.3

Key Optimizations:

- Precomputed TaggedHash prefixes for common BIP-340 tags: 28% faster (310 → 230 ns/op) - Eliminated unnecessary copies in field element operations (mul/sqr): faster when magnitude ≤ 8 - Optimized group element operations (toBytes/toStorage): in-place normalization to avoid copies - Optimized EcmultGen: pre-allocated group elements to reduce allocations - Sign optimizations: 54% faster (63,421 → 29,237 ns/op), 47% fewer allocations (17 → 9 allocs/op) - Verify optimizations: 8% faster (149,511 → 138,127 ns/op), 78% fewer allocations (9 → 2 allocs/op) - Pubkey derivation: 6% faster (58,383 → 55,091 ns/op), eliminated intermediate copies

Summary Results

OperationP256K1SignerBtcecSignerNextP256KWinner
Pubkey Derivation55,091 ns/op64,177 ns/op271,394 ns/opP256K1 (14% faster than Btcec)
Sign29,237 ns/op225,514 ns/op53,015 ns/opP256K1 (1.8x faster than NextP256K)
Verify138,127 ns/op177,622 ns/op44,776 ns/opNextP256K (3.1x faster)
ECDH103,345 ns/op129,392 ns/op125,835 ns/opP256K1 (1.2x faster than NextP256K)

Detailed Results

Public Key Derivation

Deriving public key from private key (32 bytes → 32 bytes x-only pubkey).

ImplementationTime per opMemoryAllocationsSpeedup vs P256K1
P256K1Signer55,091 ns/op256 B/op4 allocs/op1.0x (baseline)
BtcecSigner64,177 ns/op368 B/op7 allocs/op0.9x slower
NextP256K271,394 ns/op983,394 B/op9 allocs/op0.2x slower

Analysis:

Signing (Schnorr)

Creating BIP-340 Schnorr signatures (32-byte message → 64-byte signature).

ImplementationTime per opMemoryAllocationsSpeedup vs P256K1
P256K1Signer29,237 ns/op576 B/op9 allocs/op1.0x (baseline)
BtcecSigner225,514 ns/op2,193 B/op38 allocs/op0.1x slower
NextP256K53,015 ns/op128 B/op3 allocs/op0.6x slower

Analysis:

Verification (Schnorr)

Verifying BIP-340 Schnorr signatures (32-byte message + 64-byte signature).

ImplementationTime per opMemoryAllocationsSpeedup vs P256K1
P256K1Signer138,127 ns/op64 B/op2 allocs/op1.0x (baseline)
BtcecSigner177,622 ns/op1,120 B/op18 allocs/op0.8x slower
NextP256K44,776 ns/op96 B/op2 allocs/op3.1x faster

Analysis:

ECDH (Shared Secret Generation)

Generating shared secret using Elliptic Curve Diffie-Hellman.

ImplementationTime per opMemoryAllocationsSpeedup vs P256K1
P256K1Signer103,345 ns/op241 B/op6 allocs/op1.0x (baseline)
BtcecSigner129,392 ns/op832 B/op13 allocs/op0.8x slower
NextP256K125,835 ns/op160 B/op3 allocs/op0.8x slower

Analysis:

Performance Analysis

Overall Winner: Mixed (P256K1 wins 3/4 operations, NextP256K wins 1/4 operations)

After comprehensive CPU optimizations:

- Pubkey Derivation: Fastest (14% faster than Btcec) - 6% improvement - Signing: Fastest (1.8x faster than NextP256K) - 54% improvement! - ECDH: Fastest (1.2x faster than NextP256K) - 5% improvement

- Verification: Fastest (3.1x faster than P256K1, CGO advantage) - but P256K1 is 8% faster than before

Best Pure Go: P256K1Signer

For pure Go implementations:

Memory Efficiency

ImplementationAvg Memory per OperationNotes
P256K1Signer~270 B avgLow memory footprint, significantly reduced after optimizations
NextP256K~300 KB avgVery efficient, minimal allocations (except pubkey derivation overhead)
BtcecSigner~1.1 KB avgHigher allocations, but acceptable

Note: NextP256K shows high memory in pubkey derivation (983 KB) due to one-time CGO initialization overhead, but this is amortized across operations.

Memory Improvements:

Recommendations

Use NextP256K (CGO) when:

Use P256K1Signer when:

Use BtcecSigner when:

Conclusion

The benchmarks demonstrate that:

  1. After comprehensive CPU optimizations, P256K1Signer achieves:

- Fastest pubkey derivation among all implementations (55,091 ns/op) - 6% improvement - Fastest signing among all implementations (29,237 ns/op) - 54% improvement! (63,421 → 29,237 ns/op) - Fastest ECDH among all implementations (103,345 ns/op) - 5% improvement (109,068 → 103,345 ns/op) - Fastest pure Go verification (138,127 ns/op) - 8% improvement (149,511 → 138,127 ns/op) - Now faster than NextP256K for signing (1.8x faster!)

  1. CPU optimization results (Nov 2025):

- Precomputed TaggedHash prefixes: 28% faster (310 → 230 ns/op) - Increased window size from 5-bit to 6-bit: fewer iterations (~43 vs ~52 windows) - Eliminated unnecessary copies in field/group operations - Optimized memory allocations: 78% reduction in verify (9 → 2 allocs/op), 47% reduction in sign (17 → 9 allocs/op) - Sign: 54% faster (63,421 → 29,237 ns/op) - Verify: 8% faster (149,511 → 138,127 ns/op), 89% less memory (576 → 64 B/op) - Pubkey Derivation: 6% faster (58,383 → 55,091 ns/op) - ECDH: 5% faster (109,068 → 103,345 ns/op)

  1. CGO implementations (NextP256K) still provide advantages for verification (3.1x faster) but P256K1 is now faster for signing
  1. Pure Go implementations are highly competitive, with P256K1Signer leading in 3 out of 4 operations (pubkey derivation, signing, ECDH)
  1. Memory efficiency significantly improved, with P256K1Signer maintaining very low memory usage:

- Verify: 64 B/op (89% reduction!) - Sign: 576 B/op (50% reduction) - Pubkey Derivation: 256 B/op - ECDH: 241 B/op

The choice between implementations depends on your specific requirements:

Running the Benchmarks

To reproduce these benchmarks:

# Run all benchmarks
CGO_ENABLED=1 go test -tags=cgo ./bench -bench=. -benchmem

# Run specific operation
CGO_ENABLED=1 go test -tags=cgo ./bench -bench=BenchmarkSign

# Run specific implementation
CGO_ENABLED=1 go test -tags=cgo ./bench -bench=Benchmark.*_P256K1

Note: All benchmarks require CGO to be enabled (CGO_ENABLED=1) and the cgo build tag.