Symbolic Training & Axiomatic Fine-tuning

Moving from "Massive Messy Data" to building a Symbolic Map in the embedding table.

1. The "Identity" Weight

1 × x = x

When you teach the model that 1 × x always equals x, the model transitions from learning a number to learning a Transformation.

2. The "Alias" Mapping

"Hundred" ≡ 100

This is Vector Alignment. The 4,096-dimension vector for "Hundred" is nudged until it is nearly identical to the vector for the digits 1, 0, 0.

Synonym Logic: The tokens become mathematically interchangeable. When seeing "Two hundred," the internal search engine pulls weights for 2 and 100 simultaneously.

3. Systematic Generalization

By providing "building blocks," the model performs Systematic Generalization. It combines learned rules to solve novel problems without re-calculation.

4. Better vs. Brittle

Feature Rule-Based Training Standard Massive Training
Accuracy 100% on defined rules. ~95% (Approximation).
Flexibility Brittle: Fails on "one times x". Robust: Understands slang/typos.
Logic Inductive / Hard-coded. Deductive / Statistical.
The Verdict: You are building a "Logic Gate." You are moving from Language Modeling to Logic Modeling, turning the Embedding Table into a Lookup Table of Laws. The model behaves less like a poet and more like a Compiler.