iskra becomes the algorithm layer between raw storage (iskradb) and domain-specific implementations (transdb for NL, iskra/code for programming languages). Each language - natural or programming - is a sub-lattice that shares the same iskradb store and composition machinery.
iskradb raw B-tree storage, no domain knowledge
↑
iskra coord system, sub-lattice model, abstract pipeline
↑
transdb JA/EN: JMdict ingest, JA morphology, particle tables
iskra/code Moxie/SRC: mxcorpus ingest, AST analysis (currently here)
(future) HR/RU: Slavic morphology tables
The insight: translation between any two languages is a sub-lattice path problem, whether those languages are EN→JA, SRC→IR, or Moxie→LLVM. The same coord, relax, and cluster machinery applies.
Move from transdb, generalize:
PackCoord(semantic, grammatical, cooccur, morph, pragmatic, valency, register uint64) uint64
RelaxCoord(coord uint64) []uint64
CoordSemanticShift, CoordMorphShift, CoordGrammaticalShift, CoordCooccurShift, ...
SemanticHumanSubj, SemanticHumanObj, SemanticAnimSubj, ... (all 16 flags)
CoordSemantic(coord) uint64
CoordMorph(coord) uint8
CoordCooccur(prev, next uint8) uint64
MakeKey(domain uint8, coord uint64, word string) lattice.Key - "lang" becomes "domain". Domain 0 = reserved, domains 1-255 assigned per language. LangEN=0x01, LangJA=0x02 stay in transdb; future DomainMoxieSRC=0x10 etc. go in iskra/code.
RelaxCoord is already language-agnostic. The relaxation order (semantic → pragmatic → register → valency → grammatical → cooccur → morph) applies to any domain that uses the coord.
POSForWord, ActiveBranches, branchOrderJA stay in transdb — they are JA-specific branch heuristics.
Move the encoding protocol, not the language-specific states:
SetMorphState(rec *lattice.Record, state uint8)
GetMorphState(rec *lattice.Record) uint8
SetSemanticInDataFile(rec *lattice.Record, flags uint64)
GetSemanticFromDataFile(rec *lattice.Record) uint64
PackBranch(pos, reg, dom, spec uint8) uint8
POSFromBranch(b uint8) uint8
RegFromBranch, DomFromBranch, SpecFromBranch
BranchWeirdness, MatchesFilter
The MorphState constants (MorphPresAffPlain, MorphPastProgNeg, ...) are the 5-bit wu xing encoding — they describe any language's morphological state space, not JA specifically. Move constants to iskra/morph.mx.
JA-specific verb forms (kuruStateForms, suruFormSuffixes, VerbPatterns, BuildVerbForms, addVerbForms) stay in transdb — they are JA morphology tables.
EN-specific forms (enIrregs, regularPast, regularProg) stay in transdb.
Register/domain/honorific constants (RegNeutral, RegFormal, DomGeneral, ...) move to iskra — they are universal lexical metadata.
The descriptor framework is language-agnostic:
LangDesc{Order, HeadFinal, Particle, PreNomRC, ZeroCopula, Markers}
OrderSVO, OrderSOV, OrderVSO, ...
MarkerPrepositional, MarkerPostpositional, MarkerCase
RoleNone, RoleNPSubjTopic, RoleNPSubjGram, RoleNPObjDirect, RolePPLocative, ...
RegisterLangDesc(tree, pool, domain, desc)
GetLangDesc(tree, domain) (LangDesc, bool)
RegisterParticleRole(tree, pool, domain, semCoord, particle, role)
LookupParticleRole(tree, domain, particle, npFlags) uint8
LookupTargetMarker(dstDomain, role) string -- table stays here, entries generic
JA-specific particle strings ("は", "が", ...) and jaDefaultRole map stay in transdb.
EN-specific preposition strings stay in transdb (or move to LookupTargetMarker table in iskra).
The CoordVerbClass sentinel constant moves to iskra/langdesc.mx (it is part of the registration protocol, not JA-specific).
Registration protocol and class-code scheme move to iskra:
VerbClassUnknown, VerbClassV1, VerbClassV5K, ... (0-15) -- stay as JA codes
RegisterVerbClass(tree, pool, domain, dictForm, code)
GetVerbClass(tree, domain, dictForm) (string, bool)
VerbClassCode(s string) uint8 and VerbClassStr(code uint8) string stay in transdb because the class name strings ("v1", "v5k", ...) are JMdict-specific.
InflectJA(dictForm, class, state) string and InflectJAFromTree stay in transdb.
For future Slavic: InflectHR(stem, class, case, number) string goes in a new hr package that imports iskra for the registration protocol.
The general interface to define:
// iskra/inflect.mx
type InflectFunc func(dictForm string, classCode uint8, state uint8) string
// Each language registers its inflect function at init time.
func RegisterInflectFunc(domain uint8, fn InflectFunc)
func InflectFromTree(tree *lattice.Tree, domain uint8, dictForm string, state uint8) string
Add to LangDesc and the domain registration protocol:
// iskra/langdesc.mx
type KeyNormalizer func(word string) string
func RegisterKeyNormalizer(domain uint8, fn KeyNormalizer)
func NormalizeKey(domain uint8, word string) string
NormalizeKey is called by MakeKey callers before hashing. Default (nil) = identity. Each domain registers its own:
LangJA: identity — JA surface forms are already canonicalLangEN: lowercase — "Cat" and "cat" are the same lookupDomainMoxieSRC: qualified name resolver — Method → package.Type.Method when a package context is known, or strip qualifier to bare name for fuzzy lookup (two-pass: try qualified first, relax to bare)DomainMoxieIR: LLVM mangled name → canonical (strip @, strip calling convention prefix)The Translate signature in sublattice.mx becomes:
func Translate(src SubLattice, dst SubLattice, word string, coord uint64) string
where word is already normalized for the source domain by the time it reaches Translate. Callers use NormalizeKey(src.Domain, rawWord) before calling Translate. This keeps the composition layer clean — Translate never sees raw unnormalized input.
The two-pass pattern for code lookup (qualified → bare) is implemented in the domain's normalizer as a closure over the package context, not in Translate itself. Translate calls MakeKey once with whatever the normalizer produces.
This also fixes an existing ambiguity in transdb: LookupWordCtx currently lowercases nothing and treats "Cat" and "cat" as different keys. Moving normalization to a registered hook makes the behavior explicit per domain rather than implicit per callsite.
The pipeline is abstract; particle detection is language-specific input:
// iskra/cluster.mx
type ClusterType uint8 // NPSubj, NPObj, VP, PP, Mod
type Cluster struct { Kind, Tokens, Flags, Role, Nested, Trans, Copular }
// Abstract pipeline stages — caller provides language hooks
type ParticleDetector func(tok string) bool
type RoleLookup func(tok string, npFlags uint64) uint8
ParseClusters(tokens []string, isParticle ParticleDetector, lookupRole RoleLookup,
hasVerb func([]string) bool) []*Cluster
TranslateCluster(c *Cluster, lookup HeadLookup, srcDomain, dstDomain uint8)
ReorderClusters(clusters []*Cluster, srcOrder, dstOrder uint8) []*Cluster
InsertMarkers(clusters []*Cluster, dstDesc LangDesc, dstDomain uint8) string
jaParticleSet, jaDefaultRole, jaFunctionWord, isPureHiragana, accumFlags, hasVerb, filterContent stay in transdb as JA-specific implementations of the hooks.
clusterHeadLookup (coord-relaxation lookup) moves to iskra — it uses MakeKey and RelaxCoord, not JA-specific logic. The verbStems fallback stays in transdb.
joinWords, clusterKindFromRole, clusterFlags move to iskra.
Map the 5 unused iskradb branches to code-specific axes using the coord system now in iskra:
| Branch | Axis | Code mapping |
|---|---|---|
| Bsemantic (0) | ontological | purity: pure=0, IO=1, mutating=2, unsafe=3 |
| Bcooccur (2) | co-occurrence | call graph: callee prev/next type |
| Bvalency (6) | argument count | arity (0=niladic, 1=unary, 2=binary, 3=variadic) |
| Bregister (7) | register | scope level / allocation class |
| Bphonology (5) | phonological | reserved / package-level grouping |
Replace FNV-56 key derivation with SipHash128(coord + name) using the coord system. The stage tag migrates to the domain byte: DomainMoxieSRC=0x10, DomainMoxieAST=0x11, DomainMoxieIR=0x12, DomainMoxieASM=0x13, DomainMoxieBIN=0x14.
Cross-stage adjacency (currently key-implicit via same FNV-56 + different stage prefix) becomes: same SipHash of name, different domain byte, RelaxCoord for "find this construct at a different stage".
Replace linear BindingSignature scan with coord-based lookup: encode arity in the valency axis, purity in the semantic axis. find-sig becomes LookupWordCtx(tree, pool, name, DomainMoxieSRC, coord) with RelaxCoord doing the structural distance search.
New file. A sub-lattice is a slice of the iskradb tree where all keys share the same domain byte. Sub-lattices can be composed (translated between) by finding coord-equivalent records across domains.
// iskra/sublattice.mx
// SubLattice is a domain-scoped view of an iskradb tree.
type SubLattice struct {
Tree *lattice.Tree
Pool []byte
Domain uint8
}
// Translate finds the best match in dstDomain for a record in srcDomain.
// Uses the record's coord (from DataFile morph+semantic bits) as the
// lookup coord in the destination sub-lattice.
func Translate(src SubLattice, dst SubLattice, word string, coord uint64) string
// Compose merges two sub-lattices into the same tree.
// Keys don't collide because domain bytes differ.
func Compose(tree *lattice.Tree, a, b SubLattice)
Translation between any two languages becomes: Translate(jaSubLattice, enSubLattice, word, coord) — the same call regardless of whether word is JA/EN, Moxie/LLVM, or HR/RU.
All steps are non-breaking (transdb imports iskra, no API surface changes visible to callers).
import "git.mleku.dev/iskra". ~1 hour.InflectFunc hook. transdb registers its JA inflect function. ~2 hours.Total: ~2-3 days of work, none of which blocks current transdb/iskradb development.
Stays in transdb permanently:
Stays in iskradb permanently:
iskradb (storage, no imports from iskra or transdb)
↑
iskra (algorithms; imports iskradb)
↑
transdb (JA/EN data; imports iskra + iskradb)
iskra (code lattice; imports iskra + iskradb, no NL dependency)
The code lattice and NL lattice share iskra's algorithm layer and coexist in the same iskradb store with non-overlapping domain bytes.