Moving from "Massive Messy Data" to building a Symbolic Map in the embedding table.
When you teach the model that 1 × x always equals x, the model transitions from learning a number to learning a Transformation.
* and the token following the =.x is a made-up word like "Glorp", the model predicts "Glorp". It has learned the Algebraic Pattern.This is Vector Alignment. The 4,096-dimension vector for "Hundred" is nudged until it is nearly identical to the vector for the digits 1, 0, 0.
2 and 100 simultaneously.
By providing "building blocks," the model performs Systematic Generalization. It combines learned rules to solve novel problems without re-calculation.
1 × x = xHundred = 1001 × Hundred = 100 (Traveling the "Bridge" in weight space).| Feature | Rule-Based Training | Standard Massive Training |
|---|---|---|
| Accuracy | 100% on defined rules. | ~95% (Approximation). |
| Flexibility | Brittle: Fails on "one times x". | Robust: Understands slang/typos. |
| Logic | Inductive / Hard-coded. | Deductive / Statistical. |