Rethinking SLMs: Separating Reasoning from Knowledge

How small can Small Language Models (SLMs) get before they stop being useful?

This is a critical question for edge AI. Currently, even small models waste valuable space memorising an extensive, if lossy, "offline Wikipedia" of facts. Is this embedded knowledge an asset, or just bloat?

I believe we must separate the 'cognitive core' (reasoning) from the 'knowledge base' (facts). This would free up parameters and allow for tiny, hyper-efficient models.

This framework uses a Modular Architecture (the "what"), where a lightweight core routes tasks to external tools (APIs, databases). The core is trained via Tool-Augmented Generation (the "how") to be an expert operator of these tools, not a memoriser of answers.

But how do we build a pure reasoning core without it memorising the internet? Three key research areas are tackling this:

Data-Centric: Training on knowledge-free synthetic data (logic, code) to learn the structure of thinking.
Objective-Centric: Using Process Supervision to reward the model for using a tool rather than for answering from memory.
Post-Training Pruning: "Unlearning" or surgically removing facts from a pre-trained model.

This shift from static "memory work" to dynamic "cognitive work" is gaining traction (as Andrej Karpathy recently discussed—link in comments).

This isn't just about speed; it's about sustainability. Separating reasoning from knowledge has profound implications for reducing AI's energy footprint.