iskra

lattice language model - semantic walks on bethe lattice (cayley tree). bounded only by memory access speed, low compute requirements.

git clone https://git.smesh.lol/iskra.git

iskra

iskra

Code lattice engine. Maps code constructs across five compiler stages (SRC→AST→IR→ASM→BIN) onto a multi-branch B-tree, enabling forward and reverse tracing, structural isomorphism detection, and lattice-based compilation.

Built in Moxie. Storage: iskradb.

Architecture

Three semantic branches

Iskra uses three of iskradb's eight branches, one per semantic axis:

Branchiskradb aliasContents
BnounBgrammaticalTypes, structs, interfaces, variables, imports, constants
BverbBmorphologyFunctions, methods
BmodifierBpragmaticExpressions, statements, control flow, fields

KindToBranch(kind NodeKind) maps each NodeKind flag to the correct branch.

Key format

lattice.Key = [2]uint64

Key[0] = (stage << 56) | (FNV-1a-56(name) & 0x00FFFFFFFFFFFFFF)
Key[1] = 0

Stage occupies the high 8 bits; a 56-bit FNV-1a hash of the construct's name fills the rest. Cross-stage adjacency is key-implicit: the same name at a different stage produces the same 56-bit hash, different stage prefix — no explicit link stored.

MakeKey(stage uint8, hash uint64) lattice.Key NameKey(stage uint8, name string) lattice.Key SegmentKey(stage uint8, content []byte) lattice.Key

Stages

StageSRC  0x01   source text
StageAST  0x02   abstract syntax tree
StageIR   0x03   intermediate representation (LLVM IR)
StageASM  0x04   assembly
StageBIN  0x05   binary / bytecode

Node kinds (NodeKind bitfield)

KindPkg      1<<0    KindImport  1<<1
KindConst    1<<2    KindVar     1<<3
KindType     1<<4    KindFunc    1<<5
KindMethod   1<<6    KindField   1<<7
KindStmt     1<<8    KindExpr    1<<9
KindControl  1<<10   KindLiteral 1<<11

TritPath (iskra-local)

Encodes a position in the three-branch semantic space. Up to 8 trits packed into uint32 (2 bits each, 0b00 = sentinel end).

Level 1 - declaration kind:

BranchType = 1    BranchFunc = 2    BranchData = 3

Level 2 - subdivision:

Type:  SubField=1  SubMethod=2  SubEmbed=3
Func:  SubStatement=1  SubExpression=2  SubControl=3
Data:  SubConst=1  SubVar=2  SubImport=3

KindToTritPath(kind NodeKind) TritPath — maps a NodeKind to its lattice position. TritPathDistance(a, b TritPath) int — hop count via lowest common ancestor.


Data structures

Tree

type Tree struct {
    db         *lattice.Tree    // iskradb B-tree (C=8, 200-byte nodes, 3 active branches)
    RecMeta    []MetaEntry      // per-record metadata, indexed by record index
    StringPool []byte           // overflow for names > 23 bytes
    TokenPool  []uint32         // tokenized content (dict-compressed)
    Dict       *Dict            // token vocabulary
    CostMap    map[uint32]CostEntry  // recIdx → benchmark cost data
}

RecMeta is grown atomically with each db.InsertRec call. Record index is the single unified index for both the B-tree record and metadata — there are no separate MetaIdx/LexIdx pairs.

MetaEntry (32 bytes)

Count    uint32      reference count / occurrence count
Kind     NodeKind    construct classification
StageTag uint8       stage this entry was ingested from
Extra    [16]byte    packed auxiliary data (see below)

Extra layout:

[0:4]   TritPath (uint32 LE)
[4]     low nibble: Rotation (branch transition encoding)
[5:8]   24-bit signature hash (fast rejection in isomorphism lookup)
[8:12]  ContentOffset into TokenPool (uint32 LE)
[12:16] ContentLen in tokens (uint32 LE)

Record (iskradb, 48 bytes)

DataFile uint32    overflow flag (1) or 0
DataOff  uint32    offset into StringPool
DataLen  uint32    name length
Link     [2]uint32 cross-branch record indices (FinalizeLinks populates these)
Branch   uint8     branch index (0-7)
Inline   [24]byte  name inline storage: bytes 0-22 = data, byte 23 = length

SetFormOnRecord / FormFromRecord handle the inline/overflow split at 23 bytes.

Dict

Vocabulary for content compression. Tokens are deduplicated into DictEntry{Text []byte, Class uint8} and referenced by index in TokenPool. SortByFrequency produces a remapping for optimal compression.

Cross-stage and cross-branch adjacency

Cross-stage (key-implicit)

Same name at different stages → same 56-bit hash, different stage prefix. StageLinksFor(recIdx) probes all five stages in the same branch using MakeKey(stage, hash). No explicit adjacency list stored.

Cross-branch (explicit Link)

FinalizeLinks() wires Record.Link[0] and Link[1] to the noun/verb/modifier counterparts of the same key at the same stage. Called once after bulk ingest.

CrossWalk (iskradb) follows Link pointers across branches. StageLinksFor follows the key-implicit stage links.

Corpus format

Manifest: tab-separated CSV produced by mxcorpus.

id  kind  name  ast_file  ast_dump  ir_file  asm_file  bin_file  lineinfo

LoadManifest(path) ([]CorpusSegment, error) parses the manifest. LoadCorpus(dir string, t *Tree) error loads all stages into the tree.

Binary mesh format

iskra save serializes a loaded corpus to a .mesh file for fast reloading. The mesh is the concat of:

  1. iskradb .iskr flat binary (nodes + records + RecKey map)
  2. .meta companion file (MetaEntry slice, StringPool, TokenPool, Dict, CostMap)

iskra load-mesh restores both.

Distances

All distances are exact integers. No floating point, no embeddings, no thresholds.

TritPath distance — hops via LCA on the 3-branch ternary tree:

shared = length of common trit prefix
distance = (depth_a - shared) + (depth_b - shared)

Tree distance — hops via LCA in the B-tree:

pathA = node indices from root to keyA
pathB = node indices from root to keyB
distance = (len(pathA) - 1 - lca_depth) + (len(pathB) - 1 - lca_depth)

Measures storage locality. Keys in the same leaf: distance 0. Sibling leaves: distance 2.

Geometric distance (binding signatures) — L1 distance over sorted binding tuples:

for each paired binding:
    +1 if declaration depths differ
    +|refDepths_a[i] - refDepths_b[i]| per reference pair
    +|len(refs_a) - len(refs_b)| for unmatched references
+|num_locals_a - num_locals_b| for unmatched bindings
(same for externals)

Two structurally identical functions: distance 0. One extra local variable: distance 1.


Symbol isomorphism

BindingSignature: the geometric structure of where names are declared and referenced, at what scope depths, with names erased. Two functions with identical binding signatures are alpha-equivalent (same structure, different names).

SignatureHash24(sig) uint32 — 24-bit fast-rejection hash stored in MetaEntry.Extra[5:8].

find-sig looks up functions by signature match across the lattice.

audit verifies that every function in a corpus is self-isomorphic (distance 0 to itself) — 100% pass rate confirms correct signature generation.

Rotation system

Three-branch transitions encode the semantic direction of code flow:

Type → Func → Data → Type    (clockwise / right)
Type → Data → Func → Type    (counterclockwise / left)
RotationTokenMeaning
RightClockwiseType decl followed by method
LeftCounter-clockwiseVariable followed by function call
StaySame branchTwo consecutive expressions
BlockOpen {Enter scope
BlockClose }Leave scope
Separator ;Statement boundary
List ,Parameter/argument list

Encoded in MetaEntry.Extra[4] low nibble. Used for sequence modeling and bigram transitions.


Lattice-based compilation

iskra compile -mesh <f.mesh> -o <out.ll> <src.mx...> uses the lattice as a compilation substrate: looks up each source construct by key, retrieves its stored IR content, and assembles the output. Enables name substitution (subst) and structural transformation without reparsing.

Building

MOXIEROOT=../moxie moxie build ./cmd/iskra

CLI

iskra load <corpus-dir>                       load mxcorpus output, print stats
iskra save <corpus-dir> <out.mesh>            load corpus, save binary mesh
iskra load-multi <dir1> [dir2...] -o <f>      merge multiple corpora, save mesh
iskra load-mesh <file.mesh>                   load mesh, print stats
iskra lookup <mesh-or-corpus> <name>          lookup segment, show cross-stage links
iskra forward <mesh-or-corpus> <name>         find SRC entry, follow to IR
iskra reverse <mesh-or-corpus> <name>         walk BIN→ASM→IR→AST→SRC
iskra content <mesh-or-corpus> <name>         show stored content for all stages
iskra quality <corpus-dir>                    measure reverse reconstruction coverage
iskra stats <file.mesh>                       lattice statistics
iskra audit <corpus-dirs...>                  isomorphism audit of all corpora
iskra find-sig <mesh-or-corpus> <name>        find isomorphic functions by signature
iskra subst <mesh-or-corpus> <src> <tgt>      substitute names in IR
iskra astgen <source.mx>                      generate AST dumps from source file
iskra compile -mesh <f.mesh> [-pkg p] -o <out.ll> <src.mx...>   lattice-based compile
iskra pipeline <corpus-dirs...>               evaluate SRC→AST→IR transformations
iskra verify <corpus-dir>                     verify lattice integrity
iskra vocab <mesh-or-corpus>                  print dictionary statistics
iskra classify-pair <mesh> <name1> <name2>    classify structural relationship
iskra bench-gen -d <dir> [-c <corpus>]        generate benchmark .mx files
iskra bench-run -d <dir> -m <moxieroot>       run benchmarks, produce TSV
iskra bench-ingest <mesh-or-corpus> <tsv>     ingest benchmark costs into lattice
iskra test                                    built-in smoke test

License

Licensed under AGPL-3.0-or-later.

files