This is my first attempt at running an eval of this nature so would love some methodology feedback. I can't guarantee the sources weren't already in the model's inputs without getting novel translations from native speakers, but from my experience using the top models, they feel very accurate. ...
Source: [Hacker News](https://lector.dev/eval/)