Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Esin Durmus, Vivek Gupta, Nelson Liu, Nanyun Peng, Yu Su (Editors)

Anthology ID:
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Esin Durmus | Vivek Gupta | Nelson Liu | Nanyun Peng | Yu Su

pdf bib
Shuffled-token Detection for Refining Pre-trained RoBERTaRoBERTa
Subhadarshi Panda | Anjali Agrawal | Jeewon Ha | Benjamin Bloch

State-of-the-art transformer models have achieved robust performance on a variety of NLP tasks. Many of these approaches have employed domain agnostic pre-training tasks to train models that yield highly generalized sentence representations that can be fine-tuned for specific downstream tasks. We propose refining a pre-trained NLP model using the objective of detecting shuffled tokens. We use a sequential approach by starting with the pre-trained RoBERTa model and training it using our approach. Applying random shuffling strategy on the word-level, we found that our approach enables the RoBERTa model achieve better performance on 4 out of 7 GLUE tasks. Our results indicate that learning to detect shuffled tokens is a promising approach to learn more coherent sentence representations.

pdf bib
Morphology-Aware Meta-Embeddings for TamilTamil
Arjun Sai Krishnan | Seyoon Ragavan

In this work, we explore generating morphologically enhanced word embeddings for Tamil, a highly agglutinative South Indian language with rich morphology that remains low-resource with regards to NLP tasks. We present here the first-ever word analogy dataset for Tamil, consisting of 4499 hand-curated word tetrads across 10 semantic and 13 morphological relation types. Using a rules-based segmenter to capture morphology as well as meta-embedding techniques, we train meta-embeddings that outperform existing baselines by 16 % on our analogy task and appear to mitigate a previously observed trade-off between semantic and morphological accuracy.

pdf bib
Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation
Yiping Jin | Akshay Bhatia | Dittaya Wanvarie

Weakly-supervised text classification aims to induce text classifiers from only a few user-provided seed words. The vast majority of previous work assumes high-quality seed words are given. However, the expert-annotated seed words are sometimes non-trivial to come up with. Furthermore, in the weakly-supervised learning setting, we do not have any labeled document to measure the seed words’ efficacy, making the seed word selection process a walk in the dark. In this work, we remove the need for expert-curated seed words by first mining (noisy) candidate seed words associated with the category names. We then train interim models with individual candidate seed words. Lastly, we estimate the interim models’ error rate in an unsupervised manner. The seed words that yield the lowest estimated error rates are added to the final seed word set. A comprehensive evaluation of six binary classification tasks on four popular datasets demonstrates that the proposed method outperforms a baseline using only category name seed words and obtained comparable performance as a counterpart using expert-annotated seed words.

pdf bib
Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response Generation
Tatsuya Ide | Daisuke Kawahara

For a computer to naturally interact with a human, it needs to be human-like. In this paper, we propose a neural response generation model with multi-task learning of generation and classification, focusing on emotion. Our model based on BART (Lewis et al., 2020), a pre-trained transformer encoder-decoder model, is trained to generate responses and recognize emotions simultaneously. Furthermore, we weight the losses for the tasks to control the update of parameters. Automatic evaluations and crowdsourced manual evaluations show that the proposed model makes generated responses more emotionally aware.

pdf bib
Comparison of Grammatical Error Correction Using Back-Translation Models
Aomi Koyama | Kengo Hotate | Masahiro Kaneko | Mamoru Komachi

Grammatical error correction (GEC) suffers from a lack of sufficient parallel data. Studies on GEC have proposed several methods to generate pseudo data, which comprise pairs of grammatical and artificially produced ungrammatical sentences. Currently, a mainstream approach to generate pseudo data is back-translation (BT). Most previous studies using BT have employed the same architecture for both the GEC and BT models. However, GEC models have different correction tendencies depending on the architecture of their models. Thus, in this study, we compare the correction tendencies of GEC models trained on pseudo data generated by three BT models with different architectures, namely, Transformer, CNN, and LSTM. The results confirm that the correction tendencies for each error type are different for every BT model. In addition, we investigate the correction tendencies when using a combination of pseudo data generated by different BT models. As a result, we find that the combination of different BT models improves or interpolates the performance of each error type compared with using a single BT model with different seeds.

pdf bib
Hie-BART : Document Summarization with Hierarchical BARTBART: Document Summarization with Hierarchical BART
Kazuki Akiyama | Akihiro Tamura | Takashi Ninomiya

This paper proposes a new abstractive document summarization model, hierarchical BART (Hie-BART), which captures hierarchical structures of a document (i.e., sentence-word structures) in the BART model. Although the existing BART model has achieved a state-of-the-art performance on document summarization tasks, the model does not have the interactions between sentence-level information and word-level information. In machine translation tasks, the performance of neural machine translation models has been improved by incorporating multi-granularity self-attention (MG-SA), which captures the relationships between words and phrases. Inspired by the previous work, the proposed Hie-BART model incorporates MG-SA into the encoder of the BART model for capturing sentence-word structures. Evaluations on the CNN / Daily Mail dataset show that the proposed Hie-BART model outperforms some strong baselines and improves the performance of a non-hierarchical BART model (+0.23 ROUGE-L).