Zhengdong Lu
2019
A Prism Module for Semantic Disentanglement in Name Entity Recognition
Kun Liu
|
Shen Li
|
Daqi Zheng
|
Zhengdong Lu
|
Sheng Gao
|
Si Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Natural Language Processing has been perplexed for many years by the problem that multiple semantics are mixed inside a word, even with the help of context. To solve this problem, we propose a prism module to disentangle the semantic aspects of words and reduce noise at the input layer of a model. In the prism module, some words are selectively replaced with task-related semantic aspects, then these denoised word representations can be fed into downstream tasks to make them easier. Besides, we also introduce a structure to train this module jointly with the downstream model without additional data. This module can be easily integrated into the downstream model and significantly improve the performance of baselines on named entity recognition (NER) task. The ablation analysis demonstrates the rationality of the method. As a side effect, the proposed method also provides a way to visualize the contribution of each word.
2017
Deep Neural Machine Translation with Linear Associative Unit
Mingxuan Wang
|
Zhengdong Lu
|
Jie Zhou
|
Qun Liu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art Neural Machine Translation (NMT) with its capability in modeling complex functions and capturing complex linguistic structures. However NMT with deep architecture in its encoder or decoder RNNs often suffer from severe gradient diffusion due to the non-linear recurrent activations, which often makes the optimization much more difficult. To address this problem we propose a novel linear associative units (LAU) to reduce the gradient propagation path inside the recurrent unit. Different from conventional approaches (LSTM unit and GRU), LAUs uses linear associative connections between input and output of the recurrent unit, which allows unimpeded information flow through both space and time The model is quite simple, but it is surprisingly effective. Our empirical study on Chinese-English translation shows that our model with proper configuration can improve by 11.7 BLEU upon Groundhog and the best reported on results in the same setting. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.
Context Gates for Neural Machine Translation
Zhaopeng Tu
|
Yang Liu
|
Zhengdong Lu
|
Xiaohua Liu
|
Hang Li
Transactions of the Association for Computational Linguistics, Volume 5
In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy of a translation while target contexts affect the fluency. Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points.
Search
Co-authors
- Mingxuan Wang 1
- Jie Zhou 1
- Qun Liu 1
- Zhaopeng Tu 1
- Yang Liu 1
- show all...