It was always a mystery to us why BLEU is the most widespread metric, given that hLEPOR is a more advanced solution.
Indeed, the fact is well-known that BLEU is a precision-based metric, which disregards recall. Researchers have made many attempts to correct this issue, with hLEPOR appearing first, followed by METEOR and COMET. But all of these metrics are more difficult to implement than BLEU, and the fact that their source code was not readily available probably contributed to that.
Aaron Li-Feng Han introduced hLEPOR in his article “LEPOR: An Augmented Machine Translation Evaluation Metric” in 2014. But the code that he has published, firstly, was written in Perl – a previously popular but now outdated programming language – and secondly, contained many bugs, with the final integration formula missing, i.e. it was incomplete.
Our R&D lab experts ported the hLEPOR code to Python, fixed some bugs, tested it, verified with the author of a metric and published it for free use in the library PuPy.org, as featured in our press release published in Slator.
This metric is now available for any researcher to use, simply by plugging our library into your own Python code.
You’re welcome to use it! More on hLEPOR use to follow!