IMPLEMENTATION_PLAN_GLV_WNAF.md raw

Implementation Plan: wNAF + GLV Endomorphism Optimization

Overview

This plan details implementing the GLV (Gallant-Lambert-Vanstone) endomorphism optimization combined with wNAF (windowed Non-Adjacent Form) for secp256k1 scalar multiplication, based on:

Expected Performance Gain

Phase 1: Constants and Basic Infrastructure

Step 1.1: Add GLV Constants to scalar.go

Add the following constants that are already defined in the C implementation:

// Lambda: cube root of unity mod n (group order)
// λ^3 ≡ 1 (mod n), and λ^2 + λ + 1 ≡ 0 (mod n)
var scalarLambda = Scalar{
    d: [4]uint64{
        0xDF02967C1B23BD72, // limb 0
        0x122E22EA20816678, // limb 1
        0xA5261C028812645A, // limb 2
        0x5363AD4CC05C30E0, // limb 3
    },
}

// Constants for scalar splitting (from libsecp256k1 scalar_impl.h lines 142-157)
var scalarMinusB1 = Scalar{
    d: [4]uint64{0x6F547FA90ABFE4C3, 0xE4437ED6010E8828, 0, 0},
}

var scalarMinusB2 = Scalar{
    d: [4]uint64{0xD765CDA83DB1562C, 0x8A280AC50774346D, 0xFFFFFFFFFFFFFFFE, 0xFFFFFFFFFFFFFFFF},
}

var scalarG1 = Scalar{
    d: [4]uint64{0xE893209A45DBB031, 0x3DAA8A1471E8CA7F, 0xE86C90E49284EB15, 0x3086D221A7D46BCD},
}

var scalarG2 = Scalar{
    d: [4]uint64{0x1571B4AE8AC47F71, 0x221208AC9DF506C6, 0x6F547FA90ABFE4C4, 0xE4437ED6010E8828},
}

Files to modify: scalar.go Tests: Add unit tests comparing with known C test vectors


Step 1.2: Add Beta Constant to field.go

Add the field element β (cube root of unity mod p):

// Beta: cube root of unity mod p (field order)
// β^3 ≡ 1 (mod p), and β^2 + β + 1 ≡ 0 (mod p)
// This enables: λ·(x,y) = (β·x, y) on secp256k1
var fieldBeta = FieldElement{
    // In 5×52-bit representation
    n: [5]uint64{...}, // Derived from: 0x7ae96a2b657c07106e64479eac3434e99cf0497512f58995c1396c28719501ee
}

Files to modify: field.go Tests: Verify β^3 ≡ 1 (mod p)


Phase 2: Scalar Splitting

Step 2.1: Implement mulshiftvar

This function computes (a * b) >> shift for scalar splitting:

// mulShiftVar computes (a * b) >> shift, returning the result
// This is used in GLV scalar splitting where shift is always 384
func (r *Scalar) mulShiftVar(a, b *Scalar, shift uint) {
    // Compute full 512-bit product
    // Extract bits [shift, shift+256) as the result
}

Reference: libsecp256k1 scalar_4x64_impl.h:secp256k1_scalar_mul_shift_var Files to modify: scalar.go Tests: Test with known inputs and compare with C implementation


Step 2.2: Implement splitLambda

The core GLV scalar splitting function:

// splitLambda decomposes scalar k into r1, r2 such that:
//   r1 + λ·r2 ≡ k (mod n)
// where r1 and r2 are approximately 128 bits each
func splitLambda(r1, r2, k *Scalar) {
    // c1 = round(k * g1 / 2^384)
    // c2 = round(k * g2 / 2^384)
    var c1, c2 Scalar
    c1.mulShiftVar(k, &scalarG1, 384)
    c2.mulShiftVar(k, &scalarG2, 384)

    // r2 = c1*(-b1) + c2*(-b2)
    c1.mul(&c1, &scalarMinusB1)
    c2.mul(&c2, &scalarMinusB2)
    r2.add(&c1, &c2)

    // r1 = k - r2*λ
    r1.mul(r2, &scalarLambda)
    r1.negate(r1)
    r1.add(r1, k)
}

Reference: libsecp256k1 scalar_impl.h:secp256k1_scalar_split_lambda (lines 140-178) Files to modify: scalar.go Tests:


Phase 3: Point Operations with Endomorphism

Step 3.1: Implement mulLambda for Points

Apply the endomorphism to a point:

// mulLambda applies the GLV endomorphism: λ·(x,y) = (β·x, y)
func (r *GroupElementAffine) mulLambda(a *GroupElementAffine) {
    r.x.mul(&a.x, &fieldBeta)
    r.y = a.y
    r.infinity = a.infinity
}

Reference: libsecp256k1 group_impl.h:secp256k1_ge_mul_lambda (lines 915-922) Files to modify: group.go Tests: Verify λ·G equals expected point


Step 3.2: Implement isHigh for Scalars

Check if a scalar is in the upper half of the group order:

// isHigh returns true if s > n/2
func (s *Scalar) isHigh() bool {
    // Compare with n/2
    // n = FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141
    // n/2 = 7FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5D576E7357A4501DDFE92F46681B20A0
}

Files to modify: scalar.go Tests: Test boundary cases around n/2


Phase 4: Strauss Algorithm with GLV

Step 4.1: Implement Odd Multiples Table with Z-Ratios

The C implementation uses an efficient method to build odd multiples while tracking Z-coordinate ratios:

// buildOddMultiplesTable builds a table of odd multiples [1*a, 3*a, 5*a, ...]
// and tracks Z-coordinate ratios for efficient normalization
func buildOddMultiplesTable(
    n int,
    preA []GroupElementAffine,
    zRatios []FieldElement,
    z *FieldElement,
    a *GroupElementJacobian,
) {
    // Uses isomorphic curve trick for efficient Jacobian+Affine addition
    // See ecmult_impl.h lines 73-115
}

Reference: libsecp256k1 ecmult_impl.h:secp256k1_ecmult_odd_multiples_table Files to modify: ecdh.go or new file ecmult.go Tests: Verify table correctness


Step 4.2: Implement Table Lookup Functions

// tableGetGE retrieves point from table, handling sign
func tableGetGE(r *GroupElementAffine, pre []GroupElementAffine, n, w int) {
    // n is the wNAF digit (can be negative)
    // Returns pre[(|n|-1)/2], negated if n < 0
}

// tableGetGELambda retrieves λ-transformed point from table
func tableGetGELambda(r *GroupElementAffine, pre []GroupElementAffine, betaX []FieldElement, n, w int) {
    // Same as tableGetGE but uses precomputed β*x values
}

Reference: libsecp256k1 ecmult_impl.h lines 125-143 Files to modify: ecmult.go


Step 4.3: Implement Full Strauss-GLV Algorithm

This is the main multiplication function:

// ecmultStraussWNAF computes r = na*a + ng*G using Strauss algorithm with GLV
func ecmultStraussWNAF(r *GroupElementJacobian, a *GroupElementJacobian, na *Scalar, ng *Scalar) {
    // 1. Split scalars using GLV endomorphism
    //    na = na1 + λ*na2 (where na1, na2 are ~128 bits)

    // 2. Build odd multiples table for a
    //    Also precompute β*x for λ-transformed lookups

    // 3. Convert both half-scalars to wNAF representation
    //    wNAF size is 129 bits (128 + 1 for potential overflow)

    // 4. For generator G: split scalar and use precomputed tables
    //    ng = ng1 + 2^128*ng2 (simple bit split, not GLV)

    // 5. Main loop (from MSB to LSB):
    //    - Double result
    //    - Add contributions from wNAF digits for na1, na2, ng1, ng2
}

Reference: libsecp256k1 ecmult_impl.h:secp256k1_ecmult_strauss_wnaf (lines 237-347) Files to modify: ecmult.go Tests: Compare results with existing implementation


Phase 5: Generator Precomputation

Step 5.1: Precompute Generator Tables

For maximum performance, precompute tables for G and 2^128*G:

// preG contains precomputed odd multiples of G for window size WINDOW_G
// preG[i] = (2*i+1)*G for i = 0 to (1 << (WINDOW_G-2)) - 1
var preG [1 << (WINDOW_G - 2)]GroupElementStorage

// preG128 contains precomputed odd multiples of 2^128*G
var preG128 [1 << (WINDOW_G - 2)]GroupElementStorage

Options:

  1. Generate at init() time (slower startup, no code bloat)
  2. Generate with go:generate and embed (faster startup, larger binary)

Files to modify: New file ecmult_gen_table.go or precomputed.go


Step 5.2: Optimize Generator Multiplication

// ecmultGen computes r = ng*G using precomputed tables
func ecmultGen(r *GroupElementJacobian, ng *Scalar) {
    // Split ng = ng1 + 2^128*ng2
    // Use preG for ng1 lookups
    // Use preG128 for ng2 lookups
    // Combine using Strauss algorithm
}

Phase 6: Integration and Testing

Step 6.1: Update Public APIs

Update the main multiplication functions to use the new implementation:

// Ecmult computes r = na*a + ng*G
func Ecmult(r *GroupElementJacobian, a *GroupElementJacobian, na, ng *Scalar) {
    ecmultStraussWNAF(r, a, na, ng)
}

// EcmultGen computes r = ng*G (generator multiplication only)
func EcmultGen(r *GroupElementJacobian, ng *Scalar) {
    ecmultGen(r, ng)
}

Step 6.2: Comprehensive Testing

  1. Correctness tests:

- Compare with existing slow implementation - Test edge cases (zero scalar, infinity point, scalar = n-1) - Test with random scalars

  1. Property tests:

- Verify r1 + λ·r2 ≡ k (mod n) for splitLambda - Verify λ·(x,y) = (β·x, y) for mulLambda - Verify β^3 ≡ 1 (mod p) - Verify λ^3 ≡ 1 (mod n)

  1. Cross-validation:

- Compare with btcec or other Go implementations - Test vectors from libsecp256k1

Step 6.3: Benchmarking

Add comprehensive benchmarks:

func BenchmarkEcmultStraussGLV(b *testing.B) {
    // Benchmark new GLV implementation
}

func BenchmarkEcmultOld(b *testing.B) {
    // Benchmark old implementation for comparison
}

func BenchmarkScalarSplitLambda(b *testing.B) {
    // Benchmark scalar splitting
}

Implementation Order

The recommended order minimizes dependencies:

StepDescriptionDependenciesEstimated Complexity
1.1Add GLV scalar constantsNoneLow
1.2Add Beta field constantNoneLow
2.1Implement mulShiftVarNoneMedium
2.2Implement splitLambda1.1, 2.1Medium
3.1Implement mulLambda for points1.2Low
3.2Implement isHighNoneLow
4.1Build odd multiples tableNoneMedium
4.2Table lookup functions4.1Low
4.3Full Strauss-GLV algorithm2.2, 3.1, 3.2, 4.1, 4.2High
5.1Generator precomputation4.1Medium
5.2Optimized generator mult5.1Medium
6.xTesting and integrationAll aboveMedium

Key Differences from Current Implementation

The current Go implementation in ecdh.go has:

The new implementation adds:

References

  1. libsecp256k1 source:

- src/scalar_impl.h - GLV constants and splitLambda - src/ecmult_impl.h - Strauss algorithm with wNAF - src/field.h - Beta constant - src/group_impl.h - Point lambda multiplication

  1. Papers:

- "Faster Point Multiplication on Elliptic Curves with Efficient Endomorphisms" (GLV, 2001) - "Guide to Elliptic Curve Cryptography" (Hankerson, Menezes, Vanstone) - Algorithm 3.74

  1. IACR ePrint 2021/1151:

- SIMD acceleration techniques - Window size optimization analysis