This document compares four secp256k1 implementations:
Generated: 2025-11-29 Platform: linux/amd64 CPU: AMD Ryzen 5 PRO 4650G with Radeon Graphics (AVX2/BMI2 supported) Go Version: go1.25.3
| Operation | btcec/v2 | P256K1 Pure Go | P256K1 ASM | libsecp256k1 (C) |
|---|---|---|---|---|
| Pubkey Derivation | ~50 µs | 56 µs | 56 µs* | 22 µs |
| Sign | ~60 µs | 58 µs | 58 µs* | 41 µs |
| Verify | ~100 µs | 182 µs | 182 µs* | 47 µs |
| ECDH | ~120 µs | 119 µs | 119 µs* | N/A |
*Note: AVX2/BMI2 assembly optimizations are currently implemented for field operations but require additional integration work to show speedups at the high-level API. The assembly code is available in field_amd64_bmi2.s.
The btcec library is the widely-used pure Go implementation from the btcd project:
| Operation | Time per op |
|---|---|
| Pubkey Derivation | ~50 µs |
| Schnorr Sign | ~60 µs |
| Schnorr Verify | ~100 µs |
| ECDH | ~120 µs |
This implementation with SetAVX2Enabled(false):
| Operation | Time per op |
|---|---|
| Pubkey Derivation | 56 µs |
| Schnorr Sign | 58 µs |
| Schnorr Verify | 182 µs |
| ECDH | 119 µs |
This implementation with SetAVX2Enabled(true):
| Operation | Time per op | Notes |
|---|---|---|
| Pubkey Derivation | 56 µs | Uses GLV optimization |
| Schnorr Sign | 58 µs | Uses GLV for k*G |
| Schnorr Verify | 182 µs | Signature verification |
| ECDH | 119 µs | Uses GLV for scalar mult |
Field Operation Speedups (Low-level):
The BMI2-based field multiplication is available in field_amd64_bmi2.s and provides faster 256-bit modular arithmetic using the MULX instruction.
The fastest option, using the Bitcoin Core C library:
| Operation | Time per op |
|---|---|
| Pubkey Derivation | 22 µs |
| Schnorr Sign | 41 µs |
| Schnorr Verify | 47 µs |
| ECDH | N/A |
The GLV (Gallant-Lambert-Vanstone) endomorphism exploits secp256k1's special curve structure:
This reduces 256-bit scalar multiplication to two 128-bit multiplications:
| Operation | Without GLV | With GLV | Speedup |
|---|---|---|---|
| Generator mult (k*G) | 122 µs | 45 µs | 2.7x |
| Arbitrary point mult | 122 µs | 101 µs | 17% |
The field_amd64_bmi2.s file contains optimized assembly using:
From fastest to slowest for typical cryptographic operations:
- 2-4x faster than pure Go implementations - Uses purego (no CGO required)
- Mature, well-tested codebase - Slightly faster verification than P256K1
- Competitive signing performance - 2.7x faster generator multiplication with GLV - Ongoing BMI2 assembly integration
Use libsecp256k1 when:
Use btcec/v2 when:
Use P256K1 when:
# Run all SIMD comparison benchmarks
go test ./bench -bench='BenchmarkBtcec|BenchmarkP256K1PureGo|BenchmarkP256K1ASM|BenchmarkLibSecp256k1' -benchtime=1s -run=^$
# Run specific benchmark category
go test ./bench -bench=BenchmarkBtcec -benchtime=1s -run=^$
go test ./bench -bench=BenchmarkP256K1PureGo -benchtime=1s -run=^$
go test ./bench -bench=BenchmarkP256K1ASM -benchtime=1s -run=^$
go test ./bench -bench=BenchmarkLibSecp256k1 -benchtime=1s -run=^$
# Run internal scalar multiplication benchmarks
go test -bench='BenchmarkEcmultGen|BenchmarkEcmultStraussWNAFGLV' -benchtime=1s
The P256K1 implementation automatically detects CPU features:
import "p256k1.mleku.dev"
// Check if AVX2/BMI2 is available
if p256k1.HasAVX2CPU() {
// Use optimized path
}
// Manually control AVX2 usage
p256k1.SetAVX2Enabled(false) // Force pure Go
p256k1.SetAVX2Enabled(true) // Enable AVX2/BMI2 (if available)