Faithful translation without excess or deficiency｜Exhibition Program｜NTT Communication Science Laboratories OPEN HOUSE 2025

Exhibition Program

Science of Communication and Computation

13	Faithful translation without excess or deficiency Preference optimization for LLM-based translation

Abstract

Machine translation using Large Language Models (LLMs) can lead to errors such as “missing translation,” where parts of the source text are omitted, and “hallucination,” where the translation includes content not present in the source text. In this study, we first developed a method to train a highly accurate word alignment model using pairs of sentences in different languages that refer to the same entity in Wikipedia. We then developed a method for training a translation model with fewer omissions and hallucinations by maximizing pairs of words with equivalent meanings in the source text and its translation. In the future, we aim to improve machine translation technology in areas that require precise translations, including patents, law, and medicine. We will improve the fidelity of LLM-based translation, which excel at generating fluent and lengthy translations while maintaining consistency with the source text.

Faithful translation without excess or deficiency

References

[1] Q. Wu, M. Nagata, Y. Tsuruoka, “WSPAlign: Word alignment pre-training via large-scale weakly supervised span prediction,” in Proc. The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), 2023.

[2] Q. Wu, M. Nagata, Z. Mao, Y. Tsuruoka, “Word alignment as preference for machine translation,” in Proc. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), 2024.

Poster

Please click the icon to open the full-size PDF file.

Contact

Masaaki Nagata, Linguistic Intelligence Research Group, Innovative Communication Laboratory

Click here for other research exhibits

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20

Faithful translation without excess or deficiency

Preference optimization for LLM-based translation

Contact

Download