Import Memory Optimization Plan

Goal

Constrain import memory utilization to ≤1.5GB to ensure system disk cache flushing completes adequately before continuing.

Test Results (Baseline)

File: wot_reference.jsonl (2.7 GB, ~2.16 million events)
System: 15 GB RAM, Linux
Events Saved: 2,130,545
Total Time: 48 minutes 16 seconds
Average Rate: 736 events/sec
Peak Memory: ~6.4 GB (42% of system RAM)

Memory Timeline (Baseline)

Time	Memory (RSS)	Events	Notes
Start	95 MB	0	Initial state
+10 min	2.7 GB	283k	Warming up
+20 min	4.1 GB	475k	Memory growing
+30 min	5.2 GB	720k	Peak approaching
+35 min	5.9 GB	485k	Near peak
+40 min	5.6 GB	1.3M	GC recovered memory
+48 min	6.4 GB	2.1M	Final (42% of RAM)

Root Causes of Memory Growth

1. Badger Internal Caches (configured in `database.go`)

Block cache: 1024 MB default
Index cache: 512 MB default
Memtables: 8 × 16 MB = 128 MB
Total baseline: ~1.6 GB just for configured caches

2. Badger Write Buffers

L0 tables buffer (8 tables × 16 MB)
Value log writes accumulate until compaction

3. No Backpressure in Import Loop

Events are written continuously without waiting for compaction
debug.FreeOSMemory() only runs every 5 seconds
Badger buffers writes faster than disk can flush

4. Transaction Overhead

Each SaveEvent creates a transaction
Transactions have overhead that accumulates

Proposed Mitigations

Phase 1: Reduce Badger Cache Configuration for Import

Add import-specific configuration options in app/config/config.go:

ImportBlockCacheMB  int `env:"ORLY_IMPORT_BLOCK_CACHE_MB" default:"256"`
ImportIndexCacheMB  int `env:"ORLY_IMPORT_INDEX_CACHE_MB" default:"128"`
ImportMemTableSize  int `env:"ORLY_IMPORT_MEMTABLE_SIZE_MB" default:"8"`

For a 1.5GB target:

Component	Size	Notes
Block cache	256 MB	Reduced from 1024 MB
Index cache	128 MB	Reduced from 512 MB
Memtables	4 × 8 MB = 32 MB	Reduced from 8 × 16 MB
Serial cache	~20 MB	Unchanged
Working memory	~200 MB	Buffer for processing
Total	~636 MB	Leaves headroom for 1.5GB target

Phase 2: Add Batching with Sync to Import Loop

Modify import_utils.go to batch writes and force sync:

const (
    importBatchSize      = 500    // Events per batch
    importSyncInterval   = 2000   // Events before forcing sync
    importMemCheckEvents = 1000   // Events between memory checks
    importMaxMemoryMB    = 1400   // Target max memory (MB)
)

// In processJSONLEventsWithPolicy:
var batchCount int
for scan.Scan() {
    // ... existing event processing ...

    batchCount++
    count++

    // Force sync periodically to flush writes to disk
    if batchCount >= importSyncInterval {
        d.DB.Sync()  // Force write to disk
        batchCount = 0
    }

    // Memory pressure check
    if count % importMemCheckEvents == 0 {
        var m runtime.MemStats
        runtime.ReadMemStats(&m)
        heapMB := m.HeapAlloc / 1024 / 1024

        if heapMB > importMaxMemoryMB {
            // Apply backpressure
            d.DB.Sync()
            runtime.GC()
            debug.FreeOSMemory()

            // Wait for compaction to catch up
            time.Sleep(100 * time.Millisecond)
        }
    }
}

Phase 3: Use Batch Transactions

Instead of one transaction per event, batch multiple events:

// Accumulate events for batch write
const txnBatchSize = 100

type pendingWrite struct {
    idxs       [][]byte
    compactKey []byte
    compactVal []byte
    graphKeys  [][]byte
}

var pendingWrites []pendingWrite

// In the event processing loop
pendingWrites = append(pendingWrites, pw)

if len(pendingWrites) >= txnBatchSize {
    err = d.Update(func(txn *badger.Txn) error {
        for _, pw := range pendingWrites {
            for _, key := range pw.idxs {
                txn.Set(key, nil)
            }
            txn.Set(pw.compactKey, pw.compactVal)
            for _, gk := range pw.graphKeys {
                txn.Set(gk, nil)
            }
        }
        return nil
    })
    pendingWrites = pendingWrites[:0]
}

Phase 4: Implement Adaptive Rate Limiting

type importRateLimiter struct {
    targetMemMB   uint64
    checkInterval int
    baseDelay     time.Duration
    maxDelay      time.Duration
}

func (r *importRateLimiter) maybeThrottle(eventCount int) {
    if eventCount % r.checkInterval != 0 {
        return
    }

    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    heapMB := m.HeapAlloc / 1024 / 1024

    if heapMB > r.targetMemMB {
        // Calculate delay proportional to overage
        overage := float64(heapMB - r.targetMemMB) / float64(r.targetMemMB)
        delay := time.Duration(float64(r.baseDelay) * (1 + overage*10))
        if delay > r.maxDelay {
            delay = r.maxDelay
        }

        // Force GC and wait
        runtime.GC()
        debug.FreeOSMemory()
        time.Sleep(delay)
    }
}

Implementation Order

Quick Win: Add d.DB.Sync() call every N events in import loop
Configuration: Add environment variables for import-specific cache sizes
Batching: Implement batch transactions to reduce overhead
Adaptive: Add memory-aware rate limiting

Expected Results

Approach	Memory Target	Throughput Impact
Current	~6 GB peak	736 events/sec
Phase 1 (cache reduction)	~2 GB	~700 events/sec
Phase 2 (sync + GC)	~1.5 GB	~500 events/sec
Phase 3 (batching)	~1.5 GB	~600 events/sec
Phase 4 (adaptive)	~1.4 GB	Variable

Files to Modify

app/config/config.go - Add import-specific config options
pkg/database/database.go - Add import mode with reduced caches
pkg/database/import_utils.go - Add batching, sync, and rate limiting
pkg/database/save-event.go - Add batch save method (optional, for Phase 3)

Environment Variables (Proposed)

# Import-specific cache settings (only apply during import operations)
ORLY_IMPORT_BLOCK_CACHE_MB=256      # Block cache size during import
ORLY_IMPORT_INDEX_CACHE_MB=128      # Index cache size during import
ORLY_IMPORT_MEMTABLE_SIZE_MB=8      # Memtable size during import

# Import rate limiting
ORLY_IMPORT_SYNC_INTERVAL=2000      # Events between forced syncs
ORLY_IMPORT_MAX_MEMORY_MB=1400      # Target max memory during import
ORLY_IMPORT_BATCH_SIZE=100          # Events per transaction batch

Notes

The adaptive rate limiting (Phase 4) is the most robust solution but adds complexity
Phase 2 alone should achieve the 1.5GB target with acceptable throughput
Batch transactions (Phase 3) can improve throughput but require refactoring SaveEvent
Consider making these settings configurable so users can tune for their hardware

Test Command

To re-run the import test with memory monitoring:

# Start relay with import-optimized settings
export ORLY_DATA_DIR=/tmp/orly-import-test
export ORLY_ACL_MODE=none
export ORLY_PORT=10548
export ORLY_LOG_LEVEL=info
./orly &

# Upload test file
curl -X POST \
  -F "file=@/path/to/wot_reference.jsonl" \
  http://localhost:10548/api/import

# Monitor memory
watch -n 5 'ps -p $(pgrep orly) -o pid,rss,pmem --no-headers'

IMPORT_MEMORY_OPTIMIZATION_PLAN.md raw