AI Lab

2021/06/06
Serge Gladkoff

Metrics based on embedding do not reflect a translation’s quality–and this is a far reaching fact

In the previous article we mentioned the research by Google Research team entitled “Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation,”

2021/05/13
Serge Gladkoff

Looking inside AI when it’s all around us… if that’s true at all

Humankind, in the middle of the past century, discovered nuclear power. People were trigger-happy to create a bomb and build nuclear power plants despite the lack of real knowledge and understanding…

2021/05/04
Serge Gladkoff

Why the BLEU score is usually inflated

During the model training process, a standard practice is to divide the data set into 90% for training and 10% for testing so that one can train the model on the 90% and test it on the 10%.

2021/04/30
Serge Gladkoff

We have implemented hLEPOR metric as a public Python library, for the first time ever

It was always a mystery to us why BLEU is the most widespread metric, given that hLEPOR is a more advanced solution.