name: go-memory-optimization description: This skill should be used when optimizing Go code for memory efficiency, reducing GC pressure, implementing object pooling, analyzing escape behavior, choosing between fixed-size arrays and slices, designing worker pools, or profiling memory allocations. Provides comprehensive knowledge of Go's memory model, stack vs heap allocation, sync.Pool patterns, goroutine reuse, and GC tuning.

Go Memory Optimization

Overview

This skill provides guidance on optimizing Go programs for memory efficiency and reduced garbage collection overhead. Topics include stack allocation semantics, fixed-size types, escape analysis, object pooling, goroutine management, and GC tuning.

Core Principles

The Allocation Hierarchy

Prefer allocations in this order (fastest to slowest):

Stack allocation - Zero GC cost, automatic cleanup on function return
Pooled objects - Amortized allocation cost via sync.Pool
Pre-allocated buffers - Single allocation, reused across operations
Heap allocation - GC-managed, use when lifetime exceeds function scope

When Optimization Matters

Focus memory optimization efforts on:

Hot paths executed thousands/millions of times per second
Large objects (>32KB) that stress the GC
Long-running services where GC pauses affect latency
Memory-constrained environments

Avoid premature optimization. Profile first with go tool pprof to identify actual bottlenecks.

Fixed-Size Types vs Slices

Stack Allocation with Arrays

Arrays with known compile-time size can be stack-allocated, avoiding heap entirely:

// HEAP: slice header + backing array escape to heap
func processSlice() []byte {
    data := make([]byte, 32)
    // ... use data
    return data  // escapes
}

// STACK: fixed array stays on stack if doesn't escape
func processArray() {
    var data [32]byte  // stack-allocated
    // ... use data
}   // automatically cleaned up

Fixed-Size Binary Types Pattern

Define types with explicit sizes for protocol fields, cryptographic values, and identifiers:

// Binary types enforce length and enable stack allocation
type EventID [32]byte      // SHA256 hash
type Pubkey [32]byte       // Schnorr public key
type Signature [64]byte    // Schnorr signature

// Methods operate on value receivers when size permits
func (id EventID) Hex() string {
    return hex.EncodeToString(id[:])
}

func (id EventID) IsZero() bool {
    return id == EventID{}  // efficient zero-value comparison
}

Size Thresholds

Size	Recommendation
≤64 bytes	Pass by value, stack-friendly
65-128 bytes	Consider context; value for read-only, pointer for mutation
>128 bytes	Pass by pointer to avoid copy overhead

Array to Slice Conversion

Convert fixed arrays to slices only at API boundaries:

type Hash [32]byte

func (h Hash) Bytes() []byte {
    return h[:]  // creates slice header, array stays on stack if h does
}

// Prefer methods that accept arrays directly
func VerifySignature(pubkey Pubkey, msg []byte, sig Signature) bool {
    // pubkey and sig are stack-allocated in caller
}

Escape Analysis

Understanding Escape

Variables "escape" to the heap when the compiler cannot prove their lifetime is bounded by the stack frame. Check escape behavior with:

go build -gcflags="-m -m" ./...

Common Escape Causes

// 1. Returning pointers to local variables
func escapes() *int {
    x := 42
    return &x  // x escapes
}

// 2. Storing in interface{}
func escapes(x int) interface{} {
    return x  // x escapes (boxed)
}

// 3. Closures capturing by reference
func escapes() func() int {
    x := 42
    return func() int { return x }  // x escapes
}

// 4. Slice/map with unknown capacity
func escapes(n int) []byte {
    return make([]byte, n)  // escapes (size unknown at compile time)
}

// 5. Sending pointers to channels
func escapes(ch chan *int) {
    x := 42
    ch <- &x  // x escapes
}

Preventing Escape

// 1. Accept pointers, don't return them
func noEscape(result *[32]byte) {
    // caller owns memory, function fills it
    copy(result[:], computeHash())
}

// 2. Use fixed-size arrays
func noEscape() {
    var buf [1024]byte  // known size, stack-allocated
    process(buf[:])
}

// 3. Preallocate with known capacity
func noEscape() {
    buf := make([]byte, 0, 1024)  // may stay on stack
    // ... append up to 1024 bytes
}

// 4. Avoid interface{} on hot paths
func noEscape(x int) int {
    return x * 2  // no boxing
}

sync.Pool Usage

Basic Pattern

var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 4096)
    },
}

func processRequest(data []byte) {
    buf := bufferPool.Get().([]byte)
    buf = buf[:0]  // reset length, keep capacity
    defer bufferPool.Put(buf)

    // use buf...
}

Typed Pool Wrapper

type BufferPool struct {
    pool sync.Pool
    size int
}

func NewBufferPool(size int) *BufferPool {
    return &BufferPool{
        pool: sync.Pool{
            New: func() interface{} {
                b := make([]byte, size)
                return &b
            },
        },
        size: size,
    }
}

func (p *BufferPool) Get() *[]byte {
    return p.pool.Get().(*[]byte)
}

func (p *BufferPool) Put(b *[]byte) {
    if b == nil || cap(*b) < p.size {
        return  // don't pool undersized buffers
    }
    *b = (*b)[:p.size]  // reset to full size
    p.pool.Put(b)
}

Pool Anti-Patterns

// BAD: Pool of pointers to small values (overhead exceeds benefit)
var intPool = sync.Pool{New: func() interface{} { return new(int) }}

// BAD: Not resetting state before Put
bufPool.Put(buf)  // may contain sensitive data

// BAD: Pooling objects with goroutine-local state
var connPool = sync.Pool{...}  // connections are stateful

// BAD: Assuming pooled objects persist (GC clears pools)
obj := pool.Get()
// ... long delay
pool.Put(obj)  // obj may have been GC'd during delay

When to Use sync.Pool

Use Case	Pool?	Reason
Buffers in HTTP handlers	Yes	High allocation rate, short lifetime
Encoder/decoder state	Yes	Expensive to initialize
Small values (<64 bytes)	No	Pointer overhead exceeds benefit
Long-lived objects	No	Pools are for short-lived reuse
Objects with cleanup needs	No	Pool provides no finalization

Goroutine Pooling

Worker Pool Pattern

type WorkerPool struct {
    jobs    chan func()
    workers int
    wg      sync.WaitGroup
}

func NewWorkerPool(workers, queueSize int) *WorkerPool {
    p := &WorkerPool{
        jobs:    make(chan func(), queueSize),
        workers: workers,
    }
    p.wg.Add(workers)
    for i := 0; i < workers; i++ {
        go p.worker()
    }
    return p
}

func (p *WorkerPool) worker() {
    defer p.wg.Done()
    for job := range p.jobs {
        job()
    }
}

func (p *WorkerPool) Submit(job func()) {
    p.jobs <- job
}

func (p *WorkerPool) Shutdown() {
    close(p.jobs)
    p.wg.Wait()
}

Bounded Concurrency with Semaphore

type Semaphore struct {
    sem chan struct{}
}

func NewSemaphore(n int) *Semaphore {
    return &Semaphore{sem: make(chan struct{}, n)}
}

func (s *Semaphore) Acquire() { s.sem <- struct{}{} }
func (s *Semaphore) Release() { <-s.sem }

// Usage
sem := NewSemaphore(runtime.GOMAXPROCS(0))
for _, item := range items {
    sem.Acquire()
    go func(it Item) {
        defer sem.Release()
        process(it)
    }(item)
}

Goroutine Reuse Benefits

Metric	Spawn per request	Worker pool
Goroutine creation	O(n)	O(workers)
Stack allocation	2KB × n	2KB × workers
Scheduler overhead	Higher	Lower
GC pressure	Higher	Lower

Reducing GC Pressure

Allocation Reduction Strategies

// 1. Reuse buffers across iterations
buf := make([]byte, 0, 4096)
for _, item := range items {
    buf = buf[:0]  // reset without reallocation
    buf = processItem(buf, item)
}

// 2. Preallocate slices with known length
result := make([]Item, 0, len(input))  // avoid append reallocations
for _, in := range input {
    result = append(result, transform(in))
}

// 3. Struct embedding instead of pointer fields
type Event struct {
    ID        [32]byte    // embedded, not *[32]byte
    Pubkey    [32]byte    // single allocation for entire struct
    Signature [64]byte
    Content   string      // only string data on heap
}

// 4. String interning for repeated values
var kindStrings = map[int]string{
    0: "set_metadata",
    1: "text_note",
    // ...
}

GC Tuning

import "runtime/debug"

func init() {
    // GOGC: target heap growth percentage (default 100)
    // Lower = more frequent GC, less memory
    // Higher = less frequent GC, more memory
    debug.SetGCPercent(50)  // GC when heap grows 50%

    // GOMEMLIMIT: soft memory limit (Go 1.19+)
    // GC becomes more aggressive as limit approaches
    debug.SetMemoryLimit(512 << 20)  // 512MB limit
}

Environment variables:

GOGC=50              # More aggressive GC
GOMEMLIMIT=512MiB    # Soft memory limit
GODEBUG=gctrace=1    # GC trace output

Arena Allocation (Go 1.20+, experimental)

//go:build goexperiment.arenas

import "arena"

func processLargeDataset(data []byte) Result {
    a := arena.NewArena()
    defer a.Free()  // bulk free all allocations

    // All allocations from arena are freed together
    items := arena.MakeSlice[Item](a, 0, 1000)
    // ... process

    // Copy result out before Free
    return copyResult(result)
}

Memory Profiling

Heap Profile

import "runtime/pprof"

func captureHeapProfile() {
    f, _ := os.Create("heap.prof")
    defer f.Close()
    runtime.GC()  // get accurate picture
    pprof.WriteHeapProfile(f)
}

go tool pprof -http=:8080 heap.prof
go tool pprof -alloc_space heap.prof  # total allocations
go tool pprof -inuse_space heap.prof  # current usage

Allocation Benchmarks

func BenchmarkAllocation(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        result := processData(input)
        _ = result
    }
}

Output interpretation:

BenchmarkAllocation-8  1000000  1234 ns/op  256 B/op  3 allocs/op
                                            ↑         ↑
                                   bytes/op   allocations/op

Live Memory Monitoring

func printMemStats() {
    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    fmt.Printf("Alloc: %d MB\n", m.Alloc/1024/1024)
    fmt.Printf("TotalAlloc: %d MB\n", m.TotalAlloc/1024/1024)
    fmt.Printf("Sys: %d MB\n", m.Sys/1024/1024)
    fmt.Printf("NumGC: %d\n", m.NumGC)
    fmt.Printf("GCPause: %v\n", time.Duration(m.PauseNs[(m.NumGC+255)%256]))
}

Common Patterns Reference

For detailed code examples and patterns, see references/patterns.md:

Buffer pool implementations
Zero-allocation JSON encoding
Memory-efficient string building
Slice capacity management
Struct layout optimization

Checklist for Memory-Critical Code

[ ] Profile before optimizing (go tool pprof)
[ ] Check escape analysis output (-gcflags="-m")
[ ] Use fixed-size arrays for known-size data
[ ] Implement sync.Pool for frequently allocated objects
[ ] Preallocate slices with known capacity
[ ] Reuse buffers instead of allocating new ones
[ ] Consider struct field ordering for alignment
[ ] Benchmark with -benchmem flag
[ ] Set appropriate GOGC/GOMEMLIMIT for production
[ ] Monitor GC behavior with GODEBUG=gctrace=1

SKILL.md raw