This document summarizes the memory optimization work performed on the p256k1 pure Go secp256k1 implementation.
| Metric | Before | After | Improvement |
|---|---|---|---|
| ECDSA Verify allocations | 8 allocs, 5,632 B | 0 allocs, 0 B | 100% reduction |
| ECDSA Sign allocations | 39 allocs, 2,386 B | 11 allocs, 546 B | 72% fewer allocs, 77% less memory |
| Schnorr Verify allocations | 11 allocs, 5,730 B | 3 allocs, 98 B | 73% fewer allocs, 98% less memory |
| Schnorr Sign allocations | ~40 allocs | 6 allocs, 320 B | 85% fewer allocs |
| BatchNormalize memory | 80 MB cumulative | 0 B | 100% reduction |
| Operation | Before | After | Change |
|---|---|---|---|
| ECDSA Verify | |||
| - Time | ~180 μs | 193 μs | - |
| - Allocations | 8 | 0 | -100% |
| - Bytes/op | 5,632 B | 0 B | -100% |
| ECDSA Sign | |||
| - Time | ~77 μs | 77 μs | - |
| - Allocations | 39 | 11 | -72% |
| - Bytes/op | 2,386 B | 546 B | -77% |
| ECDSA PubkeyDerivation | |||
| - Time | ~58 μs | 58 μs | - |
| - Allocations | 0 | 0 | - |
| - Bytes/op | 0 B | 0 B | - |
| Operation | Before | After | Change |
|---|---|---|---|
| Schnorr Verify | |||
| - Time | ~185 μs | 181 μs | -2% |
| - Allocations | 11 | 3 | -73% |
| - Bytes/op | 5,730 B | 98 B | -98% |
| Schnorr Sign | |||
| - Time | ~138 μs | 110 μs | -20% |
| - Allocations | ~40 | 6 | -85% |
| - Bytes/op | ~2,500 B | 320 B | -87% |
| Schnorr PubkeyDerivation | |||
| - Time | ~61 μs | 58 μs | -5% |
| - Allocations | 2 | 2 | - |
| - Bytes/op | 128 B | 128 B | - |
| Operation | p256k1 | btcec | p256k1 Advantage |
|---|---|---|---|
| Schnorr Sign (time) | 110 μs | 235 μs | 2.1x faster |
| Schnorr Sign (allocs) | 6 | 37 | 84% fewer |
| Schnorr Verify (allocs) | 3 | 16 | 81% fewer |
| ECDSA Verify (allocs) | 0 | 23 | 100% fewer |
| ECDSA Sign (allocs) | 11 | 28 | 61% fewer |
| Operation | p256k1 (Pure Go) | libsecp256k1 (C) | Notes |
|---|---|---|---|
| ECDSA Verify | 193 μs, 0 allocs | 47 μs, 12 allocs | Pure Go is 4x slower but zero-alloc |
| ECDSA Sign | 77 μs, 11 allocs | 35 μs, 13 allocs | Pure Go is 2.2x slower, similar allocs |
| Schnorr Verify | 181 μs, 3 allocs | 51 μs, 8 allocs | Pure Go is 3.5x slower, fewer allocs |
| Schnorr Sign | 110 μs, 6 allocs | 39 μs, 8 allocs | Pure Go is 2.8x slower, fewer allocs |
Problem: BatchNormalize and batchInverse used slices, causing 80MB+ cumulative allocations.
Solution: Created batchNormalize16 and batchInverse16 with fixed [16]FieldElement arrays since glvTableSize=16 is constant.
// Before: slice allocation escapes to heap
func BatchNormalize(out []GroupElementAffine, points []GroupElementJacobian)
// After: fixed-size array stays on stack
func batchNormalize16(out *[16]GroupElementAffine, points *[16]GroupElementJacobian)
Files: field.go, field_32bit.go, group.go, ecdh.go
Problem: Each HMAC operation created 2 new SHA256 contexts. RFC6979 nonce generation created 5+ HMAC contexts.
Solution: Added sync.Pool for SHA256, HMAC, and RFC6979 contexts.
var sha256Pool = sync.Pool{
New: func() interface{} { return sha256.New() },
}
var hmacPool = sync.Pool{
New: func() interface{} { return &HMACSHA256{} },
}
var rfc6979Pool = sync.Pool{
New: func() interface{} { return &RFC6979HMACSHA256{} },
}
File: hash.go
Problem: []byte{0x00} and []byte{0x01} in hot paths allocated new slices each call.
Solution: Pre-allocated package-level variables.
var (
byte0x00 = []byte{0x00}
byte0x01 = []byte{0x01}
)
File: hash.go
Problem: Dynamic slice allocations in signing functions.
Solution: Changed to fixed-size arrays.
// Before: escapes to heap
nonceKey := make([]byte, 64)
// After: stays on stack
var nonceKey [64]byte
Files: ecdsa.go, schnorr.go, schnorr_wasm.go
Problem: BIP-340 tag hashes (SHA256("BIP0340/challenge"), etc.) computed every call.
Solution: Compute once at init, cache for reuse.
var (
bip340AuxTagHash [32]byte
bip340NonceTagHash [32]byte
bip340ChallengeTagHash [32]byte
)
func getTaggedHashPrefix(tag []byte) [32]byte {
// Returns precomputed hash for known tags
}
File: hash.go
Problem: hash.Sum(nil) allocates a new slice for the result.
Solution: Use hash.Sum(buf[:0]) with pre-sized buffer.
// Before: allocates new []byte
copy(temp[:], h.inner.Sum(nil))
// After: appends to existing buffer
h.inner.Sum(temp[:0])
File: hash.go
BatchNormalize: 80 MB (53%)
batchInverse: 32 MB (21%)
SHA256/HMAC: 21 MB (14%)
Other: 18 MB (12%)
sync.Pool overhead: ~500 B (amortized)
Remaining allocs: Minimal, mostly pooled
| Metric | Before | After |
|---|---|---|
| Allocations per verify | 8-11 | 0-3 |
| GC pressure | High | Low |
| Memory churn | ~6 KB/op | <100 B/op |
The reduced allocations significantly decrease garbage collection pressure, improving latency consistency in high-throughput scenarios.
All optimizations are compatible with WASM/js builds:
sync.Pool is supported in WASM# Run Pure Go benchmarks
go test -bench="BenchmarkPureGo" -benchmem ./bench/
# Compare with btcec
go test -bench="BenchmarkBtcec" -benchmem ./bench/
# Compare with libsecp256k1 (requires libsecp256k1.so)
go test -bench="BenchmarkLibSecp" -benchmem ./bench/
# Run memory profile
go test -bench="BenchmarkPureGo_ECDSA_Sign" -memprofile=mem.prof ./bench/
go tool pprof -alloc_objects mem.prof
The memory optimization work achieved: