I wanted to know how fast a 26B mixture-of-experts model could run on a desktop CPU with no GPU. Got ~40 tok/s single-stream (lossless) and ~124 batched. The surprising part was the byte budget: for this model you compress the output head (32% of per-token bytes), not the experts (16%).
Source: [Hacker News](https://apeg.dev/writing/running-gemma4-26b-on-a-cpu/)