Symbolic Training & Axiomatic Fine-tuning

Moving from "Massive Messy Data" to building a Symbolic Map in the embedding table.

1. The "Identity" Weight

1 \times x = x

When you teach the model that 1 × x always equals x, the model transitions from learning a number to learning a Transformation.

The Connection: The attention mechanism creates a "direct wire" between the token following the * and the token following the =.
The Result: Even if x is a made-up word like "Glorp", the model predicts "Glorp". It has learned the Algebraic Pattern.

2. The "Alias" Mapping

"Hundred" \equiv 100

This is Vector Alignment. The 4,096-dimension vector for "Hundred" is nudged until it is nearly identical to the vector for the digits 1, 0, 0.

Synonym Logic: The tokens become mathematically interchangeable. When seeing "Two hundred," the internal search engine pulls weights for 2 and 100 simultaneously.

3. Systematic Generalization

By providing "building blocks," the model performs Systematic Generalization. It combines learned rules to solve novel problems without re-calculation.

Pattern A: 1 × x = x
Pattern B: Hundred = 100
Inference: 1 × Hundred = 100 (Traveling the "Bridge" in weight space).

4. Better vs. Brittle

Feature	Rule-Based Training	Standard Massive Training
Accuracy	100% on defined rules.	~95% (Approximation).
Flexibility	Brittle: Fails on "one times x".	Robust: Understands slang/typos.
Logic	Inductive / Hard-coded.	Deductive / Statistical.

The Verdict: You are building a "Logic Gate." You are moving from Language Modeling to Logic Modeling, turning the Embedding Table into a Lookup Table of Laws. The model behaves less like a poet and more like a Compiler.