The Prefill Wall: Why MTP's 2 Barely Moves Long-Context Latency (Qwen3.6-27B, RTX 3090)
My MTP post showed multi-token prediction roughly doubling Qwen3. 6-27B's generation on a 3090. A reader asked the question I'd skipped: what about prompt processing at long context ?