# iskradb Multi-branch B-tree lattice. Eight independent ordered trees share a single record pool and cross-link via stable record indices. Built in Moxie. Module: `git.smesh.lol/iskradb`. --- ## Architecture ### Eight branches The lattice has C=8 branches, one per semantic axis: | Constant | Index | Axis | Alias | |----------|-------|------|-------| | `Bsemantic` | 0 | ontological category (animacy, abstractness) | - | | `Bgrammatical` | 1 | syntactic role (nominal/verbal/function) | `Bnoun` | | `Bcooccur` | 2 | PVN co-occurrence context | - | | `Bmorphology` | 3 | morphological state (tense/polarity/aspect) | `Bverb` | | `Bpragmatic` | 4 | domain/register context | `Bmodifier` | | `Bphonology` | 5 | phonological (dormant) | - | | `Bvalency` | 6 | argument count | - | | `Bregister` | 7 | social register | - | `Bnoun`, `Bverb`, `Bmodifier` are aliases retained for backward compatibility. Each branch is an independent B-tree with coordination number C=8 (each node holds up to 8 keys). ### Node layout (200 bytes) ``` Keys [8]Key 128 bytes 8 × 128-bit SipHash keys RecPtrs [8]uint32 32 bytes record index per key slot Children [9]uint32 36 bytes child node indices Mult uint8 1 byte expansion multiplier (compact) Flags uint8 1 byte bits 4-7: key count, bits 1-3: branch, bit 0: deleted _pad [2]byte ``` ### Record layout (48 bytes) ``` DataFile uint32 4 bytes overflow file selector / flag bits DataOff uint32 4 bytes offset into string pool (overflow path) DataLen uint32 4 bytes data length Link [2]uint32 8 bytes cross-branch record indices Branch uint8 1 byte packed POS + register + domain + special _rpad [3]byte Inline [24]byte 24 bytes inline data store (up to 23 bytes + 1 length byte) ``` Records are stable across B-tree splits. Nodes move during splits and compaction; records do not. Cross-branch links (`Link[0]`, `Link[1]`) use record indices and therefore never go stale. ### Key ``` type Key [2]uint64 // 128-bit SipHash ``` Keys are opaque hashes. The default key derivation is: ```moxie lattice.HashKey(data []byte) Key // SipHash(defaultSeed, data) ``` Callers that need domain separation prepend a domain byte before hashing (e.g. transdb prepends `lang || coord_8bytes_LE` before the word bytes). --- ## Storage modes ### In-memory ```moxie t := lattice.NewTree(cap int) *Tree ``` Allocates node and record slices in memory. No flush/close needed. Used for build pipelines, tests, and consolidation passes. ### Flat binary (transdb format) Not part of the iskradb API directly - transdb's `ingest.SaveFlat`/`LoadFlat` serialize to a flat binary: ``` [4B] nodeCount [4B] recCount [C×4B] roots [4B] poolLen [nodeCount × 200B] nodes [recCount × 48B] records [poolLen B] string pool [recCount × 20B] RecKey entries: recIdx(4) + key(16) ``` ### Disk-backed (page cache) ```moxie t, err := lattice.Create(path string) (*Tree, error) t, err := lattice.Open(path string) (*Tree, error) err := t.Flush() error err := t.Close() error ``` Uses a `PageCache` with dirty-page tracking. B-tree operations are identical; the cache layer intercepts node and record access. --- ## Core operations ### Insert ```moxie t.Insert(branch Branch, key Key, rec Record) uint32 // returns node index t.InsertRec(branch Branch, key Key, rec Record) uint32 // returns record index ``` `InsertRec` resets `Link[0]` and `Link[1]` to `NullRec` after copy. Wire links after insertion. ### Lookup ```moxie t.Lookup(branch Branch, key Key) *Record // returns nil if not found t.LookupRecIdx(branch Branch, key Key) uint32 // returns NullRec if not found t.GetRecord(idx uint32) *Record t.GetNode(idx uint32) *Node ``` `LookupRecIdx` is the primary lookup path. Get the record index, then `GetRecord` to access fields including `Link` pointers. ### Delete ```moxie t.Delete(branch Branch, key Key) ``` Marks the node deleted; does not reclaim memory immediately. Call `Compact` to reclaim. ### Walk ```moxie t.Walk(branch Branch, key Key) (nodeIdx uint32, pos int, found bool) t.WalkPath(branch Branch, key Key) []uint32 // node indices from root to target ``` --- ## Cross-branch links Records carry two link slots (`Link[0]`, `Link[1]`) pointing to record indices in other branches. Because record indices are stable across splits, links remain valid after any structural change to the B-tree. ### CrossWalk ```moxie t.CrossWalk(startRecIdx uint32, pattern WalkPattern, visit func(uint32, *Record) bool) ``` Follows links according to a `WalkPattern`. Built-in patterns: ```moxie PatNounVerbMod // noun → verb → modifier → stop PatNounMod // noun → modifier → stop PatVerbNoun // verb → noun → stop PatFull // noun → verb → modifier → noun (cycle, max 3 hops) ``` `branchToLinkIndex` determines which `Link` slot corresponds to a given transition: - noun → verb: `Link[0]`; noun → modifier: `Link[1]` - verb → noun: `Link[0]`; verb → modifier: `Link[1]` - modifier → noun: `Link[0]`; modifier → verb: `Link[1]` ### Triple insert ```moxie t.InsertTriple(nounKey, verbKey, modKey Key, nRec, vRec, mRec Record) ``` Inserts three records (one per branch) and wires their cross-links. --- ## TritPath ```moxie type TritPath uint32 // up to 16 trits, 2 bits each ``` Encodes a path through the lattice coordinate space. Trit 0 = branch selector (1=noun, 2=verb, 3=modifier). Trits 1-15 encode the path within the branch. ```moxie MakeTritPath(trits ...uint8) TritPath BranchTritPath(b Branch) TritPath p.Depth() int p.Branch() uint8 p.Trit(i int) uint8 Parent(p TritPath) TritPath Child(p TritPath, dir uint8) TritPath LCA(a, b TritPath) TritPath // longest common prefix Distance(a, b TritPath) int // hop count via LCA Resolve(p TritPath, mult uint8) TritPath // expand path with multiplier ``` Used for coordinate-space navigation and subtree addressing. Not required for basic insert/lookup. --- ## Transfer operations ### SubLattice A `SubLattice` is a self-contained sub-tree extracted from a parent, with remapped record indices and preserved cross-links. ```moxie type SubLattice struct { Tree *Tree OriginKey Key SrcBranch Branch } ``` ### Extract ```moxie sl := t.ExtractBranch(branch Branch) *SubLattice sl := t.ExtractSubtree(rootNodeIdx uint32, branch Branch, filter ExportFilter) *SubLattice sl := t.ExtractLinked(startRecIdx uint32) *SubLattice ``` `ExtractLinked` follows `Link` pointers from `startRecIdx` across branches and collects all reachable records. Cross-links are remapped to new indices in the extracted tree. `ExportFilter` is `func(nodeIdx uint32, n *Node, recIdx uint32, r *Record) bool` — return false to exclude a record from the extract. ### Save/Load ```moxie err := sl.SaveFile(path string) error sl, err := lattice.LoadSubLattice(path string) (*SubLattice, error) ``` ### Graft ```moxie inserted := t.Graft(sl *SubLattice) int // returns count of inserted records ``` Merges a `SubLattice` into an existing tree. Deduplicates on key: if a key already exists in the target branch, the record is skipped. Cross-links are remapped to target indices after all records are inserted. --- ## Compaction ```moxie t.NeedsCompact() bool reclaimed := t.Compact() int ``` Reclaims free nodes after deletions by moving live nodes into the gaps and updating all child pointers. Also resolves any pending expansion entries. Safe to call at any time; the tree remains fully operational before and after. --- ## RecKey map ```moxie t.RecKey map[uint32]Key // record index → key ``` Maintained automatically on every `Insert`/`InsertRec`. Enables reverse lookup (record index → key) without B-tree traversal. Used by consolidation passes that need to re-insert records into a fresh tree while preserving keys. --- ## Usage in transdb transdb uses iskradb as its core storage engine. Key conventions: **Key derivation**: `SipHash(defaultSeed, [lang(1) || coord(8 LE) || word(...)])` where coord is a 64-bit packed bitfield encoding semantic category, morphological state, grammatical role, and context axes. **Active branches**: Bnoun (nouns/adjectives), Bverb (verbs, conjugated forms at morph coord), Bmodifier (particles, function words). Bcooccur is repurposed as metadata storage for lang descriptors, particle role maps, and verb class registrations. **Record.Branch byte**: repurposed from the lattice's POS-branch encoding to carry additional metadata (`PackBranch(pos, reg, dom, spec)` packs POS in bits 0-2, register in bits 3-4, domain in bits 5-6, honorific in bit 7). For Bcooccur metadata records, the Branch byte carries role codes, class codes, or descriptor bits directly. **Record.DataFile bits**: transdb packs morph state (bits 1-5) and semantic flags (bits 6-21) into the DataFile field of lattice records, using the field as dense auxiliary storage rather than a file pointer (since transdb uses a flat binary format, not the disk-backed page cache). **Cross-links**: `rec.Link[0]` on a JA verb record points to the primary EN translation record. `rec.Link[1]` points to a secondary translation. The TranslateWithClusters pipeline traverses these links during cluster translation.