BENCHMARK_REPORT.md raw

Benchmark Comparison Report

Signer Implementation Comparison

This report compares three signer implementations for secp256k1 operations:

  1. P256K1Signer - This repository's new port from Bitcoin Core secp256k1 (pure Go)
  2. ~~BtcecSigner - Pure Go wrapper around btcec/v2~~ (removed)
  3. LibSecp256k1 - Native C library via purego (no CGO required)

Generated: 2025-11-29 (Updated after GLV endomorphism optimization) Platform: linux/amd64 CPU: AMD Ryzen 5 PRO 4650G with Radeon Graphics Go Version: go1.25.3

Key Optimizations:

- GLV scalar splitting reduces 256-bit to two 128-bit multiplications - Strauss algorithm with wNAF (windowed Non-Adjacent Form) representation - Precomputed tables for generator G and λ*G (32 entries each) - EcmultGenGLV: 2.7x faster than reference (122 → 45 µs) - Scalar multiplication: 17% faster with GLV + Strauss (121 → 101 µs)

- Precomputed TaggedHash prefixes for common BIP-340 tags - Eliminated unnecessary copies in field element operations - Pre-allocated group elements to reduce allocations

Summary Results

OperationP256K1Signer (Pure Go)LibSecp256k1 (C)Winner
Pubkey Derivation56 µs22 µsLibSecp (2.5x faster)
Sign58 µs41 µsLibSecp (1.4x faster)
Verify182 µs47 µsLibSecp (3.9x faster)
ECDH119 µsN/AP256K1

Internal Scalar Multiplication Benchmarks

OperationTimeDescription
EcmultGenGLV45 µsGLV-optimized generator multiplication
EcmultGenSimple68 µsPrecomputed table (no GLV)
EcmultGenConstRef122 µsReference implementation
EcmultStraussWNAFGLV101 µsGLV + Strauss for arbitrary point
EcmultConst122 µsConstant-time binary method

GLV Endomorphism Optimization Details

The GLV (Gallant-Lambert-Vanstone) endomorphism exploits secp256k1's special structure where:

Implementation Components

  1. Scalar Splitting: Decompose 256-bit scalar k into two ~128-bit scalars k1, k2 such that k = k1 + k2·λ
  2. wNAF Representation: Convert scalars to windowed Non-Adjacent Form (window size 6)
  3. Precomputed Tables: 32 entries each for G and λ·G (odd multiples)
  4. Strauss Algorithm: Process both scalars simultaneously with interleaved doubling/adding

Performance Gains

MetricBefore GLVAfter GLVImprovement
Generator mult (EcmultGen)122 µs45 µs2.7x faster
Arbitrary point mult122 µs101 µs17% faster
Scalar split overheadN/A0.2 µsNegligible

Detailed Results

Public Key Derivation

Deriving public key from private key (32 bytes → 32 bytes x-only pubkey).

ImplementationTime per opNotes
P256K1Signer56 µsPure Go with GLV optimization
LibSecp256k122 µsNative C library via purego

Signing (Schnorr)

Creating BIP-340 Schnorr signatures (32-byte message → 64-byte signature).

ImplementationTime per opNotes
P256K1Signer58 µsPure Go with GLV
LibSecp256k141 µsNative C library

Verification (Schnorr)

Verifying BIP-340 Schnorr signatures (32-byte message + 64-byte signature).

ImplementationTime per opNotes
P256K1Signer182 µsPure Go with GLV
LibSecp256k147 µsNative C library (3.9x faster)

ECDH (Shared Secret Generation)

Generating shared secret using Elliptic Curve Diffie-Hellman.

ImplementationTime per opNotes
P256K1Signer119 µsPure Go with GLV

Performance Analysis

Pure Go vs Native C

The native libsecp256k1 library maintains significant advantages due to:

However, the pure Go implementation with GLV is now competitive for many use cases.

GLV Optimization Impact

The GLV endomorphism provides the most benefit for generator multiplication (used in signing):

Recommendations

Use LibSecp256k1 when:

Use P256K1Signer when:

Conclusion

The GLV endomorphism optimization significantly improves secp256k1 performance in pure Go:

  1. Generator multiplication: 2.7x faster (122 → 45 µs)
  2. Arbitrary point multiplication: 17% faster (122 → 101 µs)
  3. Scalar splitting: negligible overhead (0.2 µs)

While the native C library remains faster (especially for verification), the pure Go implementation is now much more competitive for signing operations where generator multiplication dominates.

Running the Benchmarks

To reproduce these benchmarks:

# Run all benchmarks
go test ./... -bench=. -benchmem -benchtime=2s

# Run specific scalar multiplication benchmarks
go test -bench='BenchmarkEcmultGen|BenchmarkEcmultStraussWNAFGLV' -benchtime=2s

# Run comparison benchmarks
go test ./bench -bench=. -benchtime=2s