# iskra ![iskra](iskra.jpg) Code lattice engine. Maps code constructs across five compiler stages (SRC→AST→IR→ASM→BIN) onto a multi-branch B-tree, enabling forward and reverse tracing, structural isomorphism detection, and lattice-based compilation. Built in [Moxie](https://git.mleku.dev/moxie). Storage: [iskradb](https://git.smesh.lol/iskradb). --- ## Architecture ### Three semantic branches Iskra uses three of iskradb's eight branches, one per semantic axis: | Branch | iskradb alias | Contents | |--------|--------------|----------| | `Bnoun` | `Bgrammatical` | Types, structs, interfaces, variables, imports, constants | | `Bverb` | `Bmorphology` | Functions, methods | | `Bmodifier` | `Bpragmatic` | Expressions, statements, control flow, fields | `KindToBranch(kind NodeKind)` maps each `NodeKind` flag to the correct branch. ### Key format ``` lattice.Key = [2]uint64 Key[0] = (stage << 56) | (FNV-1a-56(name) & 0x00FFFFFFFFFFFFFF) Key[1] = 0 ``` Stage occupies the high 8 bits; a 56-bit FNV-1a hash of the construct's name fills the rest. Cross-stage adjacency is **key-implicit**: the same name at a different stage produces the same 56-bit hash, different stage prefix — no explicit link stored. `MakeKey(stage uint8, hash uint64) lattice.Key` `NameKey(stage uint8, name string) lattice.Key` `SegmentKey(stage uint8, content []byte) lattice.Key` ### Stages ``` StageSRC 0x01 source text StageAST 0x02 abstract syntax tree StageIR 0x03 intermediate representation (LLVM IR) StageASM 0x04 assembly StageBIN 0x05 binary / bytecode ``` ### Node kinds (NodeKind bitfield) ``` KindPkg 1<<0 KindImport 1<<1 KindConst 1<<2 KindVar 1<<3 KindType 1<<4 KindFunc 1<<5 KindMethod 1<<6 KindField 1<<7 KindStmt 1<<8 KindExpr 1<<9 KindControl 1<<10 KindLiteral 1<<11 ``` ### TritPath (iskra-local) Encodes a position in the three-branch semantic space. Up to 8 trits packed into `uint32` (2 bits each, `0b00` = sentinel end). **Level 1** - declaration kind: ``` BranchType = 1 BranchFunc = 2 BranchData = 3 ``` **Level 2** - subdivision: ``` Type: SubField=1 SubMethod=2 SubEmbed=3 Func: SubStatement=1 SubExpression=2 SubControl=3 Data: SubConst=1 SubVar=2 SubImport=3 ``` `KindToTritPath(kind NodeKind) TritPath` — maps a `NodeKind` to its lattice position. `TritPathDistance(a, b TritPath) int` — hop count via lowest common ancestor. --- ## Data structures ### Tree ```moxie type Tree struct { db *lattice.Tree // iskradb B-tree (C=8, 200-byte nodes, 3 active branches) RecMeta []MetaEntry // per-record metadata, indexed by record index StringPool []byte // overflow for names > 23 bytes TokenPool []uint32 // tokenized content (dict-compressed) Dict *Dict // token vocabulary CostMap map[uint32]CostEntry // recIdx → benchmark cost data } ``` `RecMeta` is grown atomically with each `db.InsertRec` call. Record index is the single unified index for both the B-tree record and metadata — there are no separate MetaIdx/LexIdx pairs. ### MetaEntry (32 bytes) ``` Count uint32 reference count / occurrence count Kind NodeKind construct classification StageTag uint8 stage this entry was ingested from Extra [16]byte packed auxiliary data (see below) ``` **Extra layout:** ``` [0:4] TritPath (uint32 LE) [4] low nibble: Rotation (branch transition encoding) [5:8] 24-bit signature hash (fast rejection in isomorphism lookup) [8:12] ContentOffset into TokenPool (uint32 LE) [12:16] ContentLen in tokens (uint32 LE) ``` ### Record (iskradb, 48 bytes) ``` DataFile uint32 overflow flag (1) or 0 DataOff uint32 offset into StringPool DataLen uint32 name length Link [2]uint32 cross-branch record indices (FinalizeLinks populates these) Branch uint8 branch index (0-7) Inline [24]byte name inline storage: bytes 0-22 = data, byte 23 = length ``` `SetFormOnRecord` / `FormFromRecord` handle the inline/overflow split at 23 bytes. ### Dict Vocabulary for content compression. Tokens are deduplicated into `DictEntry{Text []byte, Class uint8}` and referenced by index in `TokenPool`. `SortByFrequency` produces a remapping for optimal compression. --- ## Cross-stage and cross-branch adjacency ### Cross-stage (key-implicit) Same name at different stages → same 56-bit hash, different stage prefix. `StageLinksFor(recIdx)` probes all five stages in the same branch using `MakeKey(stage, hash)`. No explicit adjacency list stored. ### Cross-branch (explicit Link) `FinalizeLinks()` wires `Record.Link[0]` and `Link[1]` to the noun/verb/modifier counterparts of the same key at the same stage. Called once after bulk ingest. `CrossWalk` (iskradb) follows Link pointers across branches. `StageLinksFor` follows the key-implicit stage links. --- ## Corpus format Manifest: tab-separated CSV produced by `mxcorpus`. ``` id kind name ast_file ast_dump ir_file asm_file bin_file lineinfo ``` `LoadManifest(path) ([]CorpusSegment, error)` parses the manifest. `LoadCorpus(dir string, t *Tree) error` loads all stages into the tree. ### Binary mesh format `iskra save` serializes a loaded corpus to a `.mesh` file for fast reloading. The mesh is the concat of: 1. iskradb `.iskr` flat binary (nodes + records + RecKey map) 2. `.meta` companion file (MetaEntry slice, StringPool, TokenPool, Dict, CostMap) `iskra load-mesh` restores both. --- ## Distances All distances are exact integers. No floating point, no embeddings, no thresholds. **TritPath distance** — hops via LCA on the 3-branch ternary tree: ``` shared = length of common trit prefix distance = (depth_a - shared) + (depth_b - shared) ``` **Tree distance** — hops via LCA in the B-tree: ``` pathA = node indices from root to keyA pathB = node indices from root to keyB distance = (len(pathA) - 1 - lca_depth) + (len(pathB) - 1 - lca_depth) ``` Measures storage locality. Keys in the same leaf: distance 0. Sibling leaves: distance 2. **Geometric distance** (binding signatures) — L1 distance over sorted binding tuples: ``` for each paired binding: +1 if declaration depths differ +|refDepths_a[i] - refDepths_b[i]| per reference pair +|len(refs_a) - len(refs_b)| for unmatched references +|num_locals_a - num_locals_b| for unmatched bindings (same for externals) ``` Two structurally identical functions: distance 0. One extra local variable: distance 1. --- ## Symbol isomorphism `BindingSignature`: the geometric structure of where names are declared and referenced, at what scope depths, with names erased. Two functions with identical binding signatures are alpha-equivalent (same structure, different names). `SignatureHash24(sig) uint32` — 24-bit fast-rejection hash stored in `MetaEntry.Extra[5:8]`. `find-sig` looks up functions by signature match across the lattice. `audit` verifies that every function in a corpus is self-isomorphic (distance 0 to itself) — 100% pass rate confirms correct signature generation. --- ## Rotation system Three-branch transitions encode the semantic direction of code flow: ``` Type → Func → Data → Type (clockwise / right) Type → Data → Func → Type (counterclockwise / left) ``` | Rotation | Token | Meaning | |----------|-------|---------| | Right | Clockwise | Type decl followed by method | | Left | Counter-clockwise | Variable followed by function call | | Stay | Same branch | Two consecutive expressions | | BlockOpen `{` | Enter scope | | | BlockClose `}` | Leave scope | | | Separator `;` | Statement boundary | | | List `,` | Parameter/argument list | | Encoded in `MetaEntry.Extra[4]` low nibble. Used for sequence modeling and bigram transitions. --- ## Lattice-based compilation `iskra compile -mesh -o ` uses the lattice as a compilation substrate: looks up each source construct by key, retrieves its stored IR content, and assembles the output. Enables name substitution (`subst`) and structural transformation without reparsing. --- ## Building ```sh MOXIEROOT=../moxie moxie build ./cmd/iskra ``` --- ## CLI ``` iskra load load mxcorpus output, print stats iskra save load corpus, save binary mesh iskra load-multi [dir2...] -o merge multiple corpora, save mesh iskra load-mesh load mesh, print stats iskra lookup lookup segment, show cross-stage links iskra forward find SRC entry, follow to IR iskra reverse walk BIN→ASM→IR→AST→SRC iskra content show stored content for all stages iskra quality measure reverse reconstruction coverage iskra stats lattice statistics iskra audit isomorphism audit of all corpora iskra find-sig find isomorphic functions by signature iskra subst substitute names in IR iskra astgen generate AST dumps from source file iskra compile -mesh [-pkg p] -o lattice-based compile iskra pipeline evaluate SRC→AST→IR transformations iskra verify verify lattice integrity iskra vocab print dictionary statistics iskra classify-pair classify structural relationship iskra bench-gen -d [-c ] generate benchmark .mx files iskra bench-run -d -m run benchmarks, produce TSV iskra bench-ingest ingest benchmark costs into lattice iskra test built-in smoke test ``` --- ## License Licensed under [AGPL-3.0-or-later](LICENSE).