What flipped in b9437 Build b9437 , published on May 30, 2026 at 20:56 UTC , ships two targeted default-value corrections to llama-bench . Flash attention ( -fa ) shifts from a hard-coded off to auto ( LLAMA_FLASH_ATTN_TYPE_AUTO ), and the GPU-layer count ( -ngl ) changes from the legacy sentine...

Source: [Dev.to](https://dev.to/creeta/llama-bench-skipped-fa-on-capable-gpus-b9437-corrects-it-42ik)

Sponsored