Dong Yu


pdf bib
Importance-based Neuron Allocation for Multilingual Neural Machine Translation
Wanying Xie | Yang Feng | Shuhao Gu | Dong Yu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Multilingual neural machine translation with a single model has drawn much attention due to its capability to deal with multiple languages. However, the current multilingual translation paradigm often makes the model tend to preserve the general knowledge, but ignore the language-specific knowledge. Some previous works try to solve this problem by adding various kinds of language-specific modules to the model, but they suffer from the parameter explosion problem and require specialized manual design. To solve these problems, we propose to divide the model neurons into general and language-specific parts based on their importance across languages. The general part is responsible for preserving the general knowledge and participating in the translation of all the languages, while the language-specific part is responsible for preserving the language-specific knowledge and participating in the translation of some specific languages. Experimental results on several language pairs, covering IWSLT and Europarl corpus datasets, demonstrate the effectiveness and universality of the proposed method.

pdf bib
RAST : Domain-Robust Dialogue Rewriting as Sequence TaggingRAST: Domain-Robust Dialogue Rewriting as Sequence Tagging
Jie Hao | Linfeng Song | Liwei Wang | Kun Xu | Zhaopeng Tu | Dong Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatically when testing on a different dataset. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model’s outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems when transferring to another dataset.


pdf bib
ZPR2 : Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERTZPR2: Joint Zero Pronoun Recovery and Resolution using Multi-Task Learning and BERT
Linfeng Song | Kun Xu | Yue Zhang | Jianshu Chen | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Zero pronoun recovery and resolution aim at recovering the dropped pronoun and pointing out its anaphoric mentions, respectively. We propose to better explore their interaction by solving both tasks together, while the previous work treats them separately. For zero pronoun resolution, we study this task in a more realistic setting, where no parsing trees or only automatic trees are available, while most previous work assumes gold trees. Experiments on two benchmarks show that joint modeling significantly outperforms our baseline that already beats the previous state of the arts.

pdf bib
Structural Information Preserving for Graph-to-Text Generation
Linfeng Song | Ante Wang | Jinsong Su | Yue Zhang | Kun Xu | Yubin Ge | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs. As a crucial defect, the current state-of-the-art models may mess up or even drop the core structural information of input graphs when generating outputs. We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information. In particular, we introduce two types of autoencoding losses, each individually focusing on different aspects (a.k.a. views) of input graphs. The losses are then back-propagated to better calibrate our model via multi-task training. Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.


pdf bib
BLCU-NLP at COIN-Shared Task1 : Stagewise Fine-tuning BERT for Commonsense Inference in Everyday NarrationsBLCU-NLP at COIN-Shared Task1: Stagewise Fine-tuning BERT for Commonsense Inference in Everyday Narrations
Chunhua Liu | Dong Yu
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

This paper describes our system for COIN Shared Task 1 : Commonsense Inference in Everyday Narrations. To inject more external knowledge to better reason over the narrative passage, question and answer, the system adopts a stagewise fine-tuning method based on pre-trained BERT model. More specifically, the first stage is to fine-tune on addi- tional machine reading comprehension dataset to learn more commonsense knowledge. The second stage is to fine-tune on target-task (MCScript2.0) with MCScript (2018) dataset assisted. Experimental results show that our system achieves significant improvements over the baseline systems with 84.2 % accuracy on the official test dataset.

pdf bib
DREAM : A Challenge Data Set and Models for Dialogue-Based Reading ComprehensionDREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
Kai Sun | Dian Yu | Jianshu Chen | Dong Yu | Yejin Choi | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 7

We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our data set contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension data sets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems : 84 % of answers are non-extractive, 85 % of questions require reasoning beyond a single sentence, and 34 % of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge. DREAM is available at

pdf bib
BLCU_NLP at SemEval-2019 Task 7 : An Inference Chain-based GPT Model for Rumour EvaluationBLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation
Ruoyao Yang | Wanying Xie | Chunhua Liu | Dong Yu
Proceedings of the 13th International Workshop on Semantic Evaluation

Researchers have been paying increasing attention to rumour evaluation due to the rapid spread of unsubstantiated rumours on social media platforms, including SemEval 2019 task 7. However, labelled data for learning rumour veracity is scarce, and labels in rumour stance data are highly disproportionate, making it challenging for a model to perform supervised-learning adequately. We propose an inference chain-based system, which fully utilizes conversation structure-based knowledge in the limited data and expand the training data in minority categories to alleviate class imbalance. Our approach obtains 12.6 % improvement upon the baseline system for subtask A, ranks 1st among 21 systems in subtask A, and ranks 4th among 12 systems in subtask B.

pdf bib
Improving Machine Reading Comprehension with General Reading Strategies
Kai Sun | Dian Yu | Dong Yu | Claire Cardie
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge. Just as the process of knowledge accumulation is time-consuming for human readers, it is resource-demanding to impart rich general domain knowledge into a deep language model via pre-training. Inspired by reading strategies identified in cognitive science, and given limited computational resources-just a pre-trained model and a fixed number of training instances-we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. By fine-tuning a pre-trained language model (Radford et al., 2018) with our proposed strategies on the largest general domain multiple-choice MRC dataset RACE, we obtain a 5.8 % absolute increase in accuracy over the previous best result achieved by the same pre-trained model fine-tuned on RACE without the use of strategies.

pdf bib
Improving Pre-Trained Multilingual Model with Vocabulary Expansion
Hai Wang | Dian Yu | Kai Sun | Jianshu Chen | Dong Yu
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recently, pre-trained language models have achieved remarkable success in a broad range of natural language processing tasks. However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language. Instead of exhaustively pre-training monolingual language models independently, an alternative solution is to pre-train a powerful multilingual deep language model over large-scale corpora in hundreds of languages. However, the vocabulary size for each language in such a model is relatively small, especially for low-resource languages. This limitation inevitably hinders the performance of these multilingual models on tasks such as sequence labeling, wherein in-depth token-level or sentence-level understanding is essential. In this paper, inspired by previous methods designed for monolingual settings, we investigate two approaches (i.e., joint mapping and mixture mapping) based on a pre-trained multilingual model BERT for addressing the out-of-vocabulary (OOV) problem on a variety of tasks, including part-of-speech tagging, named entity recognition, machine translation quality estimation, and machine reading comprehension. Experimental results show that using mixture mapping is more promising. To the best of our knowledge, this is the first work that attempts to address and discuss the OOV issue in multilingual settings.


pdf bib
XL-NBT : A Cross-lingual Neural Belief Tracking FrameworkXL-NBT: A Cross-lingual Neural Belief Tracking Framework
Wenhu Chen | Jianshu Chen | Yu Su | Xin Wang | Dong Yu | Xifeng Yan | William Yang Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Task-oriented dialog systems are becoming pervasive, and many companies heavily rely on them to complement human agents for customer service in call centers. With globalization, the need for providing cross-lingual customer support becomes more urgent than ever. However, cross-lingual support poses great challengesit requires a large amount of additional annotated data from native speakers. In order to bypass the expensive human annotation and achieve the first step towards the ultimate goal of building a universal dialog system, we set out to build a cross-lingual state tracking framework. Specifically, we assume that there exists a source language with dialog belief tracking annotations while the target languages have no annotated dialog data of any form. Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data. We then distill and transfer its own knowledge to the student state tracker in target languages. We specifically discuss two types of common parallel resources : bilingual corpus and bilingual dictionary, and design different transfer learning strategies accordingly. Experimentally, we successfully use English state tracker as the teacher to transfer its knowledge to both Italian and German trackers and achieve promising results.

pdf bib
BLCU_NLP at SemEval-2018 Task 12 : An Ensemble Model for Argument Reasoning Based on Hierarchical AttentionBLCU_NLP at SemEval-2018 Task 12: An Ensemble Model for Argument Reasoning Based on Hierarchical Attention
Meiqian Zhao | Chunhua Liu | Lu Liu | Yan Zhao | Dong Yu
Proceedings of The 12th International Workshop on Semantic Evaluation

To comprehend an argument and fill the gap between claims and reasons, it is vital to find the implicit supporting warrants behind. In this paper, we propose a hierarchical attention model to identify the right warrant which explains why the reason stands for the claim. Our model focuses not only on the similar part between warrants and other information but also on the contradictory part between two opposing warrants. In addition, we use the ensemble method for different models. Our model achieves an accuracy of 61 %, ranking second in this task. Experimental results demonstrate that our model is effective to make correct choices.


pdf bib
Semantic Frame Labeling with Target-based Neural Model
Yukun Feng | Dong Yu | Jian Xu | Chunhua Liu
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

This paper explores the automatic learning of distributed representations of the target’s context for semantic frame labeling with target-based neural model. We constrain the whole sentence as the model’s input without feature extraction from the sentence. This is different from many previous works in which local feature extraction of the targets is widely used. This constraint makes the task harder, especially with long sentences, but also makes our model easily applicable to a range of resources and other similar tasks. We evaluate our model on several resources and get the state-of-the-art result on subtask 2 of SemEval 2015 task 15. Finally, we extend the task to word-sense disambiguation task and we also achieve a strong result in comparison to state-of-the-art work.