Two years ago we released VQAScore: ask a VLM "does this image show {prompt}? " and use P(Yes) as the score. It became a go-to evaluation metric and reward model for image generation, replacing CLIPScore across the field (2M+ downloads on Hugging Face; used by groups at DeepMind, NVIDIA, ByteDan...

Source: [Hacker News](https://github.com/linzhiqiu/t2v_metrics)

Sponsored