Metrics based on embedding do not reflect a translation’s quality–and this is a far reaching fact

In the previous article we mentioned the research by Google Research team entitled “Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation,”

Looking inside AI when it’s all around us… if that’s true at all

Humankind, in the middle of the past century, discovered nuclear power. People were trigger-happy to create a bomb and build nuclear power plants despite the lack of real knowledge and understanding…

Why the BLEU score is usually inflated

During the model training process, a standard practice is to divide the data set into 90% for training and 10% for testing so that one can train the model on the 90% and test it on the 10%.

We have implemented hLEPOR metric as a public Python library, for the first time ever

It was always a mystery to us why BLEU is the most widespread metric, given that hLEPOR is a more advanced solution.