Wed, Jun 10 10:59 AM

🏷️ #metric

3 headlines

TechnologyHacker News• 3h ago

We gave our agent the exact metric definition. It still wrote the wrong SQL

1 points, 1 comments on Hacker News

TechnologyHacker News• 18h ago

Show HN: VQAScore – open eval metric/reward model, now for text-to-video

Two years ago we released VQAScore: ask a VLM "does this image show {prompt}? " and use P(Yes) as the score. It became a go-to evaluation metric and reward model for image generation, replacing CLIPScore across the field (2M+ downloads on Hugging Face; used by groups at DeepMind, NVIDIA, ByteDan...

TechnologyHacker News• 21h ago

Agentic surface area as an operating metric

1 points, 0 comments on Hacker News