Yak Shaving for Token Speed: Chasing MTP and TurboQuant on Linux

Yak Shaving for Token Speed: Chasing MTP and TurboQuant on Linux

Inspired by the performance gains and resource efficiency of Multi-Token Prediction (MTP) and KV cache quantisation, I started what would become another yak-shaving endeavour.

I was an early adopter of Ollama to host LLMs on my MacBook and home Ubuntu server. Between the easy access to the latest models, reliable …

more ...