
How small can Small Language Models (SLMs) get before they stop being useful?
This is a critical question for edge AI. Currently, even small models waste valuable space memorising an extensive, if lossy, "offline Wikipedia" of facts. Is this embedded knowledge an asset, or just bloat?
I believe we must separate the 'cognitive core' (reasoning) from the 'knowledge base' (facts). This would free up parameters and allow for tiny, hyper-efficient models.
This framework uses a Modular Architecture (the "what"), where a lightweight core routes tasks to external tools (APIs, databases). The core is trained via Tool-Augmented Generation (the "how") to be an expert operator of these tools, not a memoriser of answers.
But how do we build a pure reasoning core without it memorising the internet? Three key research areas are tackling this:
- Data-Centric: Training on knowledge-free synthetic data (logic, code) to learn the structure of thinking.
- Objective-Centric: Using Process Supervision to reward the model for using a tool rather than for answering from memory.
- Post-Training Pruning: "Unlearning" or surgically removing facts from a pre-trained model.
This shift from static "memory work" to dynamic "cognitive work" is gaining traction (as Andrej Karpathy recently discussed—link in comments).
This isn't just about speed; it's about sustainability. Separating reasoning from knowledge has profound implications for reducing AI's energy footprint.