Dengji Guo
2022
Prediction Difference Regularization against Perturbation for Neural Machine Translation
Dengji Guo
|
Zhengrui Ma
|
Min Zhang
|
Yang Feng
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Regularization methods applying input perturbation have drawn considerable attention and have been frequently explored for NMT tasks in recent years Despite their simplicity and effectiveness we argue that these methods are limited by the under fitting of training data In this paper we utilize prediction difference for ground truth tokens to analyze the fitting of token level samples and find that under fitting is almost as common as over fitting We introduce prediction difference regularization PD R a simple and effective method that can reduce over fitting and under fitting at the same time For all token level samples PD R minimizes the prediction difference between the original pass and the input perturbed pass making the model less sensitive to small input changes thus more robust to both perturbations and under fitted training data Experiments on three widely used WMT translation tasks show that our approach can significantly improve over existing perturbation regularization methods On WMT16 En De task our model achieves 1.80 SacreBLEU improvement over vanilla transformer