GEMBA-SQM translation quality evaluation is easy to implement as zero-shot LLM prompt … and totally useless

The hype ignores AI hallucination, because the hype is caused by people hallucinating on AI.

Translation quality evaluation is all we need

“The unpredictable abilities emerging from large AI models: Large language models like ChatGPT are now big enough that they’ve started to display startling, unpredictable behaviors.”

Smaller models are still better than LLMs if trained well

We live in the world of hype. It is rarely completely harmless, but it is especially detrimental when it is accepted without scientific verification, and used as the basis for long-term growth plans.