Australasian Language Technology Association Workshop (2020)


up

pdf (full)
bib (full)
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association

pdf bib
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association
Maria Kim | Daniel Beck | Meladel Mistica

pdf bib
Feature-Based Forensic Text Comparison Using a Poisson Model for Likelihood Ratio EstimationPoisson Model for Likelihood Ratio Estimation
Michael Carne | Shunichi Ishihara

Score- and feature-based methods are the two main ones for estimating a forensic likelihood ratio (LR) quantifying the strength of evidence. In this forensic text comparison (FTC) study, a score-based method using the Cosine distance is compared with a feature-based method built on a Poisson model with texts collected from 2,157 authors. Distance measures (e.g. Burrows’s Delta, Cosine distance) are a standard tool in authorship attribution studies. Thus, the implementation of a score-based method using a distance measure is naturally the first step for estimating LRs for textual evidence. However, textual data often violates the statistical assumptions underlying distance-based models. Furthermore, such models only assess the similarity, not the typicality, of the objects (i.e. documents) under comparison. A Poisson model is theoretically more appropriate than distance-based measures for authorship attribution, but it has never been tested with linguistic text evidence within the LR framework. The log-LR cost (Cllr) was used to assess the performance of the two methods. This study demonstrates that : (1) the feature-based method outperforms the score-based method by a Cllr value of ca. 0.09 under the best-performing settings and ; (2) the performance of the feature-based method can be further improved by feature selection.

pdf bib
Modelling Verbal Morphology in NenNen
Saliha Muradoglu | Nicholas Evans | Ekaterina Vylomova

Nen verbal morphology is particularly complex ; a transitive verb can take up to 1,740 unique forms. The combined effect of having a large combinatoric space and a low-resource setting amplifies the need for NLP tools. Nen morphology utilises distributed exponence-a non-trivial means of mapping form to meaning. In this paper, we attempt to model Nen verbal morphology using state-of-the-art machine learning models for morphological reinflection. We explore and categorise the types of errors these systems generate. Our results show sensitivity to training data composition ; different distributions of verb type yield different accuracies (patterning with E-complexity). We also demonstrate the types of patterns that can be inferred from the training data, through the case study of sycretism.

pdf bib
Pandemic Literature Search : Finding Information on COVID-19COVID-19
Vincent Nguyen | Maciek Rybinski | Sarvnaz Karimi | Zhenchang Xing

Finding information related to a pandemic of a novel disease raises new challenges for information seeking and retrieval, as the new information becomes available gradually. We investigate how to better rank information for pandemic information retrieval. We experiment with different ranking algorithms and propose a novel end-to-end method for neural retrieval, and demonstrate its effectiveness on the TREC COVID search. This work could lead to a search system that aids scientists, clinicians, policymakers and others in finding reliable answers from the scientific literature.

pdf bib
Information Extraction from Legal Documents : A Study in the Context of Common Law Court Judgements
Meladel Mistica | Geordie Z. Zhang | Hui Chia | Kabir Manandhar Shrestha | Rohit Kumar Gupta | Saket Khandelwal | Jeannie Paterson | Timothy Baldwin | Daniel Beck

‘Common Law’ judicial systems follow the doctrine of precedent, which means the legal principles articulated in court judgements are binding in subsequent cases in lower courts. For this reason, lawyers must search prior judgements for the legal principles that are relevant to their case. The difficulty for those within the legal profession is that the information that they are looking for may be contained within a few paragraphs or sentences, but those few paragraphs may be buried within a hundred-page document. In this study, we create a schema based on the relevant information that legal professionals seek within judgements and perform text classification based on it, with the aim of not only assisting lawyers in researching cases, but eventually enabling large-scale analysis of legal judgements to find trends in court outcomes over time.

pdf bib
Overview of the 2020 ALTA Shared Task : Assess Human BehaviourALTA Shared Task: Assess Human Behaviour
Diego Moll√°

The 2020 ALTA shared task is the 11th in stance of a series of shared tasks organised by ALTA since 2010. The task is to classify texts posted in social media according to human judgements expressed in them. The data used for this task is a subset of SemEval 2018 AIT DISC, which has been annotated by domain experts for this task. In this paper we introduce the task, describe the data and present the results of participating systems.