Nianwen Xue


2021

pdf bib
A Joint Model for Dropped Pronoun Recovery and Conversational Discourse Parsing in Chinese Conversational SpeechChinese Conversational Speech
Jingxuan Yang | Kerui Xu | Jun Xu | Si Li | Sheng Gao | Jun Guo | Nianwen Xue | Ji-Rong Wen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper, we present a neural model for joint dropped pronoun recovery (DPR) and conversational discourse parsing (CDP) in Chinese conversational speech. We show that DPR and CDP are closely related, and a joint model benefits both tasks. We refer to our model as DiscProReco, and it first encodes the tokens in each utterance in a conversation with a directed Graph Convolutional Network (GCN). The token states for an utterance are then aggregated to produce a single state for each utterance. The utterance states are then fed into a biaffine classifier to construct a conversational discourse graph. A second (multi-relational) GCN is then applied to the utterance states to produce a discourse relation-augmented representation for the utterances, which are then fused together with token states in each utterance as input to a dropped pronoun recovery layer. The joint model is trained and evaluated on a new Structure Parsing-enhanced Dropped Pronoun Recovery (SPDPR) data set that we annotated with both two types of information. Experimental results on the SPDPR dataset and other benchmarks show that DiscProReco significantly outperforms the state-of-the-art baselines of both tasks.

pdf bib
UMR-Writer : A Web Application for Annotating Uniform Meaning RepresentationsUMR-Writer: A Web Application for Annotating Uniform Meaning Representations
Jin Zhao | Nianwen Xue | Jens Van Gysel | Jinho D. Choi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present UMR-Writer, a web-based application for annotating Uniform Meaning Representations (UMR), a graph-based, cross-linguistically applicable semantic representation developed recently to support the development of interpretable natural language applications that require deep semantic analysis of texts. We present the functionalities of UMR-Writer and discuss the challenges in developing such a tool and how they are addressed.

pdf bib
Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop
Claire Bonial | Nianwen Xue
Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

2020

pdf bib
Annotating Temporal Dependency Graphs via CrowdsourcingAnnotating Temporal Dependency Graphs via Crowdsourcing
Jiarui Yao | Haoling Qiu | Bonan Min | Nianwen Xue
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present the construction of a corpus of 500 Wikinews articles annotated with temporal dependency graphs (TDGs) that can be used to train systems to understand temporal relations in text. We argue that temporal dependency graphs, built on previous research on narrative times and temporal anaphora, provide a representation scheme that achieves a good trade-off between completeness and practicality in temporal annotation. We also provide a crowdsourcing strategy to annotate TDGs, and demonstrate the feasibility of this approach with an evaluation of the quality of the annotation, and the utility of the resulting data set by training a machine learning model on this data set. The data set is publicly available.

pdf bib
Proceedings of the Second International Workshop on Designing Meaning Representations
Nianwen Xue | Johan Bos | William Croft | Jan Hajič | Chu-Ren Huang | Stephan Oepen | Martha Palmer | James Pustejovsky
Proceedings of the Second International Workshop on Designing Meaning Representations

pdf bib
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing
Stephan Oepen | Omri Abend | Lasha Abzianidze | Johan Bos | Jan Hajič | Daniel Hershcovich | Bin Li | Tim O'Gorman | Nianwen Xue | Daniel Zeman
Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing

2019

pdf bib
Building a Chinese AMR Bank with Concept and Relation AlignmentsChinese AMR Bank with Concept and Relation Alignments
Bin Li | Yuan Wen | Li Song | Weiguang Qu | Nianwen Xue
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing

Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts / relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71 % of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95 % of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.

pdf bib
Acquiring Structured Temporal Representation via Crowdsourcing : A Feasibility Study
Yuchen Zhang | Nianwen Xue
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Temporal Dependency Trees are a structured temporal representation that represents temporal relations among time expressions and events in a text as a dependency tree structure. Compared to traditional pair-wise temporal relation representations, temporal dependency trees facilitate efficient annotations, higher inter-annotator agreement, and efficient computations. However, annotations on temporal dependency trees so far have only been done by expert annotators, which is costly and time-consuming. In this paper, we introduce a method to crowdsource temporal dependency tree annotations, and show that this representation is intuitive and can be collected with high accuracy and agreement through crowdsourcing. We produce a corpus of temporal dependency trees, and present a baseline temporal dependency parser, trained and evaluated on this new corpus.

pdf bib
Proceedings of the First International Workshop on Designing Meaning Representations
Nianwen Xue | William Croft | Jan Hajic | Chu-Ren Huang | Stephan Oepen | Martha Palmer | James Pustejovksy
Proceedings of the First International Workshop on Designing Meaning Representations

pdf bib
Modeling Quantification and Scope in Abstract Meaning RepresentationsAbstract Meaning Representations
James Pustejovsky | Ken Lai | Nianwen Xue
Proceedings of the First International Workshop on Designing Meaning Representations

In this paper, we propose an extension to Abstract Meaning Representations (AMRs) to encode scope information of quantifiers and negation, in a way that overcomes the semantic gaps of the schema while maintaining its cognitive simplicity. Specifically, we address three phenomena not previously part of the AMR specification : quantification, negation (generally), and modality. The resulting representation, which we call Uniform Meaning Representation (UMR), adopts the predicative core of AMR and embeds it under a scope graph when appropriate. UMR representations differ from other treatments of quantification and modal scope phenomena in two ways : (a) they are more transparent ; and (b) they specify default scope when possible. ‘

pdf bib
Recovering dropped pronouns in Chinese conversations via modeling their referentsChinese conversations via modeling their referents
Jingxuan Yang | Jianzhuo Tong | Si Li | Sheng Gao | Jun Guo | Nianwen Xue
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Pronouns are often dropped in Chinese sentences, and this happens more frequently in conversational genres as their referents can be easily understood from context. Recovering dropped pronouns is essential to applications such as Information Extraction where the referents of these dropped pronouns need to be resolved, or Machine Translation when Chinese is the source language. In this work, we present a novel end-to-end neural network model to recover dropped pronouns in conversational data. Our model is based on a structured attention mechanism that models the referents of dropped pronouns utilizing both sentence-level and word-level information. Results on three different conversational genres show that our approach achieves a significant improvement over the current state of the art.

pdf bib
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning
Stephan Oepen | Omri Abend | Jan Hajic | Daniel Hershcovich | Marco Kuhlmann | Tim O’Gorman | Nianwen Xue
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

pdf bib
MRP 2019 : Cross-Framework Meaning Representation ParsingMRP 2019: Cross-Framework Meaning Representation Parsing
Stephan Oepen | Omri Abend | Jan Hajic | Daniel Hershcovich | Marco Kuhlmann | Tim O’Gorman | Nianwen Xue | Jayeol Chun | Milan Straka | Zdenka Uresova
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

The 2019 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks. Five distinct approaches to the representation of sentence meaning in the form of directed graph were represented in the training and evaluation data for the task, packaged in a uniform abstract graph representation and serialization. The task received submissions from eighteen teams, of which five do not participate in the official ranking because they arrived after the closing deadline, made use of additional training data, or involved one of the task co-organizers. All technical information regarding the task, including system submissions, official results, and links to supporting resources and software are available from the task web site at : http://mrp.nlpl.eu

2018

pdf bib
Neural Ranking Models for Temporal Dependency Structure Parsing
Yuchen Zhang | Nianwen Xue
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We design and build the first neural temporal dependency parser. It utilizes a neural ranking model with minimal feature engineering, and parses time expressions and events in a text into a temporal dependency tree structure. We evaluate our parser on two domains : news reports and narrative stories. In a parsing-only evaluation setup where gold time expressions and events are provided, our parser reaches 0.81 and 0.70 f-score on unlabeled and labeled parsing respectively, a result that is very competitive against alternative approaches. In an end-to-end evaluation setup where time expressions and events are automatically recognized, our parser beats two strong baselines on both data domains. Our experimental results and discussions shed light on the nature of temporal dependency structures in different domains and provide insights that we believe will be valuable to future research in this area.

pdf bib
Transition-Based Chinese AMR ParsingChinese AMR Parsing
Chuan Wang | Bin Li | Nianwen Xue
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

This paper presents the first AMR parser built on the Chinese AMR bank. By applying a transition-based AMR parsing framework to Chinese, we first investigate how well the transitions first designed for English AMR parsing generalize to Chinese and provide a comparative analysis between the transitions for English and Chinese. We then perform a detailed error analysis to identify the major challenges in Chinese AMR parsing that we hope will inform future research in this area.

2017

pdf bib
Proceedings of the IJCNLP 2017, Shared Tasks
Chao-Hong Liu | Preslav Nakov | Nianwen Xue
Proceedings of the IJCNLP 2017, Shared Tasks

pdf bib
Proceedings of the 11th Linguistic Annotation Workshop
Nathan Schneider | Nianwen Xue
Proceedings of the 11th Linguistic Annotation Workshop

pdf bib
Translation Divergences in ChineseEnglish Machine Translation : An Empirical InvestigationChinese–English Machine Translation: An Empirical Investigation
Dun Deng | Nianwen Xue
Computational Linguistics, Volume 43, Issue 3 - September 2017

In this article, we conduct an empirical investigation of translation divergences between Chinese and English relying on a parallel treebank. To do this, we first devise a hierarchical alignment scheme where Chinese and English parse trees are aligned in a way that eliminates conflicts and redundancies between word alignments and syntactic parses to prevent the generation of spurious translation divergences. Using this Hierarchically Aligned ChineseEnglish Parallel Treebank (HACEPT), we are able to semi-automatically identify and categorize the translation divergences between the two languages and quantify each type of translation divergence. Our results show that the translation divergences are much broader than described in previous studies that are largely based on anecdotal evidence and linguistic knowledge. The distribution of the translation divergences also shows that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We also show that HACEPT allows the extraction of syntax-based translation rules, most of which are expressive enough to capture the translation divergences, and point out that the syntactic annotation in existing treebanks is not optimal for extracting such translation rules. We also discuss the implications of our study for attempts to bridge translation divergences by devising shared semantic representations across languages. Our quantitative results lend further support to the observation that although it is possible to bridge some translation divergences with semantic representations, other translation divergences are open-ended, thus building a semantic representation that captures all possible translation divergences may be impractical.

pdf bib
Getting the Most out of AMR ParsingAMR Parsing
Chuan Wang | Nianwen Xue
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper proposes to tackle the AMR parsing bottleneck by improving two components of an AMR parser : concept identification and alignment. We first build a Bidirectional LSTM based concept identifier that is able to incorporate richer contextual information to learn sparse AMR concept labels. We then extend an HMM-based word-to-concept alignment model with graph distance distortion and a rescoring method during decoding to incorporate the structural information in the AMR graph. We show integrating the two components into an existing AMR parser results in consistently better performance over the state of the art on various datasets.

pdf bib
A Systematic Study of Neural Discourse Models for Implicit Discourse Relation
Attapol Rutherford | Vera Demberg | Nianwen Xue
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Many neural network models have been proposed to tackle this problem. However, the comparison for this task is not unified, so we could hardly draw clear conclusions about the effectiveness of various architectures. Here, we propose neural network models that are based on feedforward and long-short term memory architecture and systematically study the effects of varying structures. To our surprise, the best-configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Further, we compare our best feedforward system with competitive convolutional and recurrent networks and find that feedforward can actually be more effective. For the first time for this task, we compile and publish outputs from previous neural and non-neural systems to establish the standard for further comparison.

pdf bib
Addressing the Data Sparsity Issue in Neural AMR ParsingAMR Parsing
Xiaochang Peng | Chuan Wang | Daniel Gildea | Nianwen Xue
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Neural attention models have achieved great success in different NLP tasks. However, they have not fulfilled their promise on the AMR parsing task due to the data sparsity issue. In this paper, we describe a sequence-to-sequence model for AMR parsing and present different ways to tackle the data sparsity problem. We show that our methods achieve significant improvement over a baseline neural attention model and our results are also competitive against state-of-the-art systems that do not use extra linguistic resources.