lattice language model - semantic walks on bethe lattice (cayley tree). bounded only by memory access speed, low compute requirements.
git clone https://git.smesh.lol/iskra.git

Code lattice engine. Maps code constructs across five compiler stages (SRC→AST→IR→ASM→BIN) onto a multi-branch B-tree, enabling forward and reverse tracing, structural isomorphism detection, and lattice-based compilation.
Built in Moxie. Storage: iskradb.
Iskra uses three of iskradb's eight branches, one per semantic axis:
| Branch | iskradb alias | Contents |
|---|---|---|
Bnoun | Bgrammatical | Types, structs, interfaces, variables, imports, constants |
Bverb | Bmorphology | Functions, methods |
Bmodifier | Bpragmatic | Expressions, statements, control flow, fields |
KindToBranch(kind NodeKind) maps each NodeKind flag to the correct branch.
lattice.Key = [2]uint64
Key[0] = (stage << 56) | (FNV-1a-56(name) & 0x00FFFFFFFFFFFFFF)
Key[1] = 0
Stage occupies the high 8 bits; a 56-bit FNV-1a hash of the construct's name fills the rest. Cross-stage adjacency is key-implicit: the same name at a different stage produces the same 56-bit hash, different stage prefix — no explicit link stored.
MakeKey(stage uint8, hash uint64) lattice.Key
NameKey(stage uint8, name string) lattice.Key
SegmentKey(stage uint8, content []byte) lattice.Key
StageSRC 0x01 source text
StageAST 0x02 abstract syntax tree
StageIR 0x03 intermediate representation (LLVM IR)
StageASM 0x04 assembly
StageBIN 0x05 binary / bytecode
KindPkg 1<<0 KindImport 1<<1
KindConst 1<<2 KindVar 1<<3
KindType 1<<4 KindFunc 1<<5
KindMethod 1<<6 KindField 1<<7
KindStmt 1<<8 KindExpr 1<<9
KindControl 1<<10 KindLiteral 1<<11
Encodes a position in the three-branch semantic space. Up to 8 trits packed into uint32 (2 bits each, 0b00 = sentinel end).
Level 1 - declaration kind:
BranchType = 1 BranchFunc = 2 BranchData = 3
Level 2 - subdivision:
Type: SubField=1 SubMethod=2 SubEmbed=3
Func: SubStatement=1 SubExpression=2 SubControl=3
Data: SubConst=1 SubVar=2 SubImport=3
KindToTritPath(kind NodeKind) TritPath — maps a NodeKind to its lattice position.
TritPathDistance(a, b TritPath) int — hop count via lowest common ancestor.
type Tree struct {
db *lattice.Tree // iskradb B-tree (C=8, 200-byte nodes, 3 active branches)
RecMeta []MetaEntry // per-record metadata, indexed by record index
StringPool []byte // overflow for names > 23 bytes
TokenPool []uint32 // tokenized content (dict-compressed)
Dict *Dict // token vocabulary
CostMap map[uint32]CostEntry // recIdx → benchmark cost data
}
RecMeta is grown atomically with each db.InsertRec call. Record index is the single unified index for both the B-tree record and metadata — there are no separate MetaIdx/LexIdx pairs.
Count uint32 reference count / occurrence count
Kind NodeKind construct classification
StageTag uint8 stage this entry was ingested from
Extra [16]byte packed auxiliary data (see below)
Extra layout:
[0:4] TritPath (uint32 LE)
[4] low nibble: Rotation (branch transition encoding)
[5:8] 24-bit signature hash (fast rejection in isomorphism lookup)
[8:12] ContentOffset into TokenPool (uint32 LE)
[12:16] ContentLen in tokens (uint32 LE)
DataFile uint32 overflow flag (1) or 0
DataOff uint32 offset into StringPool
DataLen uint32 name length
Link [2]uint32 cross-branch record indices (FinalizeLinks populates these)
Branch uint8 branch index (0-7)
Inline [24]byte name inline storage: bytes 0-22 = data, byte 23 = length
SetFormOnRecord / FormFromRecord handle the inline/overflow split at 23 bytes.
Vocabulary for content compression. Tokens are deduplicated into DictEntry{Text []byte, Class uint8} and referenced by index in TokenPool. SortByFrequency produces a remapping for optimal compression.
Same name at different stages → same 56-bit hash, different stage prefix. StageLinksFor(recIdx) probes all five stages in the same branch using MakeKey(stage, hash). No explicit adjacency list stored.
FinalizeLinks() wires Record.Link[0] and Link[1] to the noun/verb/modifier counterparts of the same key at the same stage. Called once after bulk ingest.
CrossWalk (iskradb) follows Link pointers across branches. StageLinksFor follows the key-implicit stage links.
Manifest: tab-separated CSV produced by mxcorpus.
id kind name ast_file ast_dump ir_file asm_file bin_file lineinfo
LoadManifest(path) ([]CorpusSegment, error) parses the manifest.
LoadCorpus(dir string, t *Tree) error loads all stages into the tree.
iskra save serializes a loaded corpus to a .mesh file for fast reloading. The mesh is the concat of:
.iskr flat binary (nodes + records + RecKey map).meta companion file (MetaEntry slice, StringPool, TokenPool, Dict, CostMap)iskra load-mesh restores both.
All distances are exact integers. No floating point, no embeddings, no thresholds.
TritPath distance — hops via LCA on the 3-branch ternary tree:
shared = length of common trit prefix
distance = (depth_a - shared) + (depth_b - shared)
Tree distance — hops via LCA in the B-tree:
pathA = node indices from root to keyA
pathB = node indices from root to keyB
distance = (len(pathA) - 1 - lca_depth) + (len(pathB) - 1 - lca_depth)
Measures storage locality. Keys in the same leaf: distance 0. Sibling leaves: distance 2.
Geometric distance (binding signatures) — L1 distance over sorted binding tuples:
for each paired binding:
+1 if declaration depths differ
+|refDepths_a[i] - refDepths_b[i]| per reference pair
+|len(refs_a) - len(refs_b)| for unmatched references
+|num_locals_a - num_locals_b| for unmatched bindings
(same for externals)
Two structurally identical functions: distance 0. One extra local variable: distance 1.
BindingSignature: the geometric structure of where names are declared and referenced, at what scope depths, with names erased. Two functions with identical binding signatures are alpha-equivalent (same structure, different names).
SignatureHash24(sig) uint32 — 24-bit fast-rejection hash stored in MetaEntry.Extra[5:8].
find-sig looks up functions by signature match across the lattice.
audit verifies that every function in a corpus is self-isomorphic (distance 0 to itself) — 100% pass rate confirms correct signature generation.
Three-branch transitions encode the semantic direction of code flow:
Type → Func → Data → Type (clockwise / right)
Type → Data → Func → Type (counterclockwise / left)
| Rotation | Token | Meaning |
|---|---|---|
| Right | Clockwise | Type decl followed by method |
| Left | Counter-clockwise | Variable followed by function call |
| Stay | Same branch | Two consecutive expressions |
BlockOpen { | Enter scope | |
BlockClose } | Leave scope | |
Separator ; | Statement boundary | |
List , | Parameter/argument list |
Encoded in MetaEntry.Extra[4] low nibble. Used for sequence modeling and bigram transitions.
iskra compile -mesh <f.mesh> -o <out.ll> <src.mx...> uses the lattice as a compilation substrate: looks up each source construct by key, retrieves its stored IR content, and assembles the output. Enables name substitution (subst) and structural transformation without reparsing.
MOXIEROOT=../moxie moxie build ./cmd/iskra
iskra load <corpus-dir> load mxcorpus output, print stats
iskra save <corpus-dir> <out.mesh> load corpus, save binary mesh
iskra load-multi <dir1> [dir2...] -o <f> merge multiple corpora, save mesh
iskra load-mesh <file.mesh> load mesh, print stats
iskra lookup <mesh-or-corpus> <name> lookup segment, show cross-stage links
iskra forward <mesh-or-corpus> <name> find SRC entry, follow to IR
iskra reverse <mesh-or-corpus> <name> walk BIN→ASM→IR→AST→SRC
iskra content <mesh-or-corpus> <name> show stored content for all stages
iskra quality <corpus-dir> measure reverse reconstruction coverage
iskra stats <file.mesh> lattice statistics
iskra audit <corpus-dirs...> isomorphism audit of all corpora
iskra find-sig <mesh-or-corpus> <name> find isomorphic functions by signature
iskra subst <mesh-or-corpus> <src> <tgt> substitute names in IR
iskra astgen <source.mx> generate AST dumps from source file
iskra compile -mesh <f.mesh> [-pkg p] -o <out.ll> <src.mx...> lattice-based compile
iskra pipeline <corpus-dirs...> evaluate SRC→AST→IR transformations
iskra verify <corpus-dir> verify lattice integrity
iskra vocab <mesh-or-corpus> print dictionary statistics
iskra classify-pair <mesh> <name1> <name2> classify structural relationship
iskra bench-gen -d <dir> [-c <corpus>] generate benchmark .mx files
iskra bench-run -d <dir> -m <moxieroot> run benchmarks, produce TSV
iskra bench-ingest <mesh-or-corpus> <tsv> ingest benchmark costs into lattice
iskra test built-in smoke test
Licensed under AGPL-3.0-or-later.