Yak Shaving for Token Speed: Chasing MTP and TurboQuant on Linux

Inspired by the performance gains and resource efficiency of Multi-Token Prediction (MTP) and KV cache quantisation, I started what would become another yak-shaving endeavour.
I was an early adopter of Ollama to host LLMs on my MacBook and home Ubuntu server. Between the easy access to the latest models, reliable …
more ...






