Australasian Language Technology Association Workshop (2021)


up

pdf (full)
bib (full)
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

pdf bib
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association
Afshin Rahimi | William Lane | Guido Zuccon

pdf bib
An Approach to the Frugal Use of Human Annotators to Scale up Auto-coding for Text Classification Tasks
Li’An Chen | Hanna Suominen

Human annotation for establishing the training data is often a very costly process in natural language processing (NLP) tasks, which has led to frugal NLP approaches becoming an important research topic. Many research teams struggle to complete projects with limited funding, labor, and computational resources. Driven by the Move-Step analytic framework theorized in the applied linguistics field, our study offers a rigorous approach to the frugal use of two human annotators to scale up auto-coding for text classification tasks. We applied the Linear Support Vector Machine algorithm to text classification of a job ad corpus. Our Cohens Kappa for inter-rater agreement and Area Under the Curve (AUC) values reached averages of 0.76 and 0.80, respectively. The calculated time consumption for our human training process was 36 days. The results indicated that even the strategic and frugal use of only two human annotators could enable the efficient training of classifiers with reasonably good performance. This study does not aim to provide generalizability of the results. Rather, we propose that the annotation strategies arising from this study be considered by our readers only if such strategies are fit for one’s specific research purposes.

pdf bib
Multi-modal Intent Classification for Assistive Robots with Large-scale Naturalistic Datasets
Karun Varghese Mathew | Venkata S Aditya Tarigoppula | Lea Frermann

Recent years have brought a tremendous growth in assistive robots / prosthetics for people with partial or complete loss of upper limb control. These technologies aim to help the users with various reaching and grasping tasks in their daily lives such as picking up an object and transporting it to a desired location ; and their utility critically depends on the ease and effectiveness of communication between the user and robot. One of the natural ways of communicating with assistive technologies is through verbal instructions. The meaning of natural language commands depends on the current configuration of the surrounding environment and needs to be interpreted in this multi-modal context, as accurate interpretation of the command is essential for a successful execution of the users intent by an assistive device. The research presented in this paper demonstrates how large-scale situated natural language datasets can support the development of robust assistive technologies. We leveraged a navigational dataset comprising 25k human-provided natural language commands covering diverse situations. We demonstrated a way to extend the dataset in a task-informed way and use it to develop multi-modal intent classifiers for pick and place tasks. Our best classifier reached 98 % accuracy in a 16-way multi-modal intent classification task, suggesting high robustness and flexibility.

pdf bib
Combining Shallow and Deep Representations for Text-Pair Classification
Vincent Nguyen | Sarvnaz Karimi | Zhenchang Xing

Text-pair classification is the task of determining the class relationship between two sentences. It is embedded in several tasks such as paraphrase identification and duplicate question detection. Contemporary methods use fine-tuned transformer encoder semantic representations of the classification token in the text-pair sequence from the transformer’s final layer for class prediction. However, research has shown that earlier parts of the network learn shallow features, such as syntax and structure, which existing methods do not directly exploit. We propose a novel convolution-based decoder for transformer-based architecture that maximizes the use of encoder hidden features for text-pair classification. Our model exploits hidden representations within transformer-based architecture. It outperforms a transformer encoder baseline on average by 50 % (relative F1-score) on six datasets from the medical, software engineering, and open-domains. Our work shows that transformer-based models can improve text-pair classification by modifying the fine-tuning step to exploit shallow features while improving model generalization, with only a slight reduction in efficiency.

pdf bib
Evaluation of Review Summaries via Question-Answering
Nannan Huang | Xiuzhen Zhang

Summarisation of reviews aims at compressing opinions expressed in multiple review documents into a concise form while still covering the key opinions. Despite the advancement in summarisation models, evaluation metrics for opinionated text summaries lag behind and still rely on lexical-matching metrics such as ROUGE. In this paper, we propose to use the question-answering(QA) approach to evaluate summaries of opinions in reviews. We propose to identify opinion-bearing text spans in the reference summary to generate QA pairs so as to capture salient opinions. A QA model is then employed to probe the candidate summary to evaluate information overlap between candidate and reference summaries. We show that our metric RunQA, Review Summary Evaluation via Question Answering, correlates well with human judgments in terms of coverage and focus of information. Finally, we design an adversarial task and demonstrate that the proposed approach is more robust than metrics in the literature for ranking summaries.

pdf bib
Document Level Hierarchical Transformer
Najam Zaidi | Trevor Cohn | Gholamreza Haffari

Generating long and coherent text is an important and challenging task encompassing many application areas such as summarization, document level machine translation and story generation. Despite the success in modeling intra-sentence coherence, existing long text generation models (e.g., BART and GPT-3) still struggle to maintain a coherent event sequence throughout the generated text. We conjecture that this is because of the difficulty for the model to revise, replace, revoke or delete any part that has been generated by the model. In this paper, we present a novel semi-autoregressive document generation model capable of revising and editing the generated text. Building on recent models by (Gu et al., 2019 ; Xu and Carpuat, 2020) we propose document generation as a hierarchical Markov decision process with a two level hierarchy, where the high and low level editing programs. We train our model using imitation learning (Hussein et al., 2017) and introduce roll-in policy such that each policy learns on the output of applying the previous action. Experiments applying the proposed approach sheds various insights on the problems of long text generation using our model. We suggest various remedies such as using distilled dataset, designing better attention mechanisms and using autoregressive models as a low level program.

pdf bib
Generating and Modifying Natural Language Explanations
Abdus Salam | Rolf Schwitter | Mehmet Orgun

HESIP is a hybrid explanation system for image predictions that combines sub-symbolic and symbolic machine learning techniques to explain the predictions of image classification tasks. The sub-symbolic component makes a prediction for an image and the symbolic component learns probabilistic symbolic rules in order to explain that prediction. In HESIP, the explanations are generated in controlled natural language from the learned probabilistic rules using a bi-directional logic grammar. In this paper, we present an explanation modification method where a human-in-the-loop can modify an incorrect explanation generated by the HESIP system and afterwards, the modified explanation is used by HESIP to learn a better explanation.

pdf bib
An Ensemble Model for Automatic Grading of Evidence
Yuting Guo | Yao Ge | Ruqi Liao | Abeed Sarker

This paper describes our approach for the automatic grading of evidence task from the Australasian Language Technology Association (ALTA) Shared Task 2021. We developed two classification models with SVM and RoBERTa and applied an ensemble technique to combine the grades from different classifiers. Our results showed that the SVM model achieved comparable results to the RoBERTa model, and the ensemble system outperformed the individual models on this task. Our system achieved the first place among five teams and obtained 3.3 % higher accuracy than the second place.