Gleb Erofeev, Serge Gladkoff, Irina Sorokina (Logrus Global)
Automatic MT evaluation metrics are indispensable for MT research. Augmented metrics such as hLEPOR include broader evaluation factors (recall and position difference penalty) in addition to the factors used in BLEU (sentence length, precision), and demonstrated higher accuracy. However, the obstacles preventing the wide use of hLEPOR were the lack of easy portable Python package and empirical weighting parameters that were tuned by manual work. This project addresses the above issues by offering a Python implementation of hLEPOR and automatic tuning of the parameters. We use existing translation memories (TM) as reference set and distillation modeling with LABSE (Language-Agnostic BERT Sentence Embedding) to calibrate parameters for custom hLEPOR (cushLEPOR). cushLEPOR maximizes the correlation between hLEPOR and the distilling model similarity score towards reference. It can be used quickly and precisely to evaluate MT output from different engines, without need of manual weight tuning for optimization. In this session you will learn how to tune hLEPOR to obtain automatic custom-tuned cushLEPOR metric far more precise than BLEU. The method does not require costly human evaluations, existing TM is taken as a reference translation set, and cushLEPOR is created to select the best MT engine for the reference data-set.