Proceedings of the 2nd Workshop on Computational Approaches to Discourse

Chloé Braud, Christian Hardmeier, Junyi Jessy Li, Annie Louis, Michael Strube, Amir Zeldes (Editors)


Anthology ID:
2021.codi-main
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic and Online
Venues:
CODI | CRAC | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2021.codi-main
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the 2nd Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube | Amir Zeldes

pdf bib
Developing Conversational Data and Detection of Conversational Humor in TeluguTelugu
Vaishnavi Pamulapati | Radhika Mamidi

In the field of humor research, there has been a recent surge of interest in the sub-domain of Conversational Humor (CH). This study has two main objectives. (a) develop a conversational (humorous and non-humorous) dataset in Telugu. (b) detect CH in the compiled dataset. In this paper, the challenges faced while collecting the data and experiments carried out are elucidated. Transfer learning and non-transfer learning techniques are implemented by utilizing pre-trained models such as FastText word embeddings, BERT language models and Text GCN, which learns the word and document embeddings simultaneously of the corpus given. State-of-the-art results are observed with a 99.3 % accuracy and a 98.5 % f1 score achieved by BERT.

pdf bib
Comparison of methods for explicit discourse connective identification across various domains
Merel Scholman | Tianai Dong | Frances Yung | Vera Demberg

Existing parse methods use varying approaches to identify explicit discourse connectives, but their performance has not been consistently evaluated in comparison to each other, nor have they been evaluated consistently on text other than newspaper articles. We here assess the performance on explicit connective identification of three parse methods (PDTB e2e, Lin et al., 2014 ; the winner of CONLL2015, Wang et al., 2015 ; and DisSent, Nie et al., 2019), along with a simple heuristic. We also examine how well these systems generalize to different datasets, namely written newspaper text (PDTB), written scientific text (BioDRB), prepared spoken text (TED-MDB) and spontaneous spoken text (Disco-SPICE). The results show that the e2e parser outperforms the other parse methods in all datasets. However, performance drops significantly from the PDTB to all other datasets. We provide a more fine-grained analysis of domain differences and connectives that prove difficult to parse, in order to highlight the areas where gains can be made.

pdf bib
Revisiting Shallow Discourse Parsing in the PDTB-3 : Handling Intra-sentential ImplicitsPDTB-3: Handling Intra-sentential Implicits
Zheng Zhao | Bonnie Webber

In the PDTB-3, several thousand implicit discourse relations were newly annotated within individual sentences, adding to the over 15,000 implicit relations annotated across adjacent sentences in the PDTB-2. Given that the position of the arguments to these intra-sentential implicits is no longer as well-defined as with inter-sentential implicits, a discourse parser must identify both their location and their sense. That is the focus of the current work. The paper provides a comprehensive analysis of our results, showcasing model performance under different scenarios, pointing out limitations and noting future directions.

pdf bib
discopy : A Neural System for Shallow Discourse Parsing
René Knaebel

This paper demonstrates discopy, a novel framework that makes it easy to design components for end-to-end shallow discourse parsing. For the purpose of demonstration, we implement recent neural approaches and integrate contextualized word embeddings to predict explicit and non-explicit discourse relations. Our proposed neural feature-free system performs competitively to systems presented at the latest Shared Task on Shallow Discourse Parsing. Finally, a web front end is shown that simplifies the inspection of annotated documents. The source code, documentation, and pretrained models are publicly accessible.

pdf bib
Capturing document context inside sentence-level neural machine translation models with self-training
Elman Mansimov | Gábor Melis | Lei Yu

Neural machine translation (NMT) has arguably achieved human level parity when trained and evaluated at the sentence-level. Document-level neural machine translation has received less attention and lags behind its sentence-level counterpart. The majority of the proposed document-level approaches investigate ways of conditioning the model on several source or target sentences to capture document context. These approaches require training a specialized NMT model from scratch on parallel document-level corpora. We propose an approach that does n’t require training a specialized model on parallel document-level corpora and is applied to a trained sentence-level NMT model at decoding time. We process the document from left to right multiple times and self-train the sentence-level model on pairs of source sentences and generated translations. Our approach reinforces the choices made by the model, thus making it more likely that the same choices will be made in other sentences in the document. We evaluate our approach on three document-level datasets : NIST Chinese-English, WMT19 Chinese-English and OpenSubtitles English-Russian. We demonstrate that our approach has higher BLEU score and higher human preference than the baseline. Qualitative analysis of our approach shows that choices made by model are consistent across the document.