Australasian Language Technology Association Workshop (2018)


up

pdf (full)
bib (full)
Proceedings of the Australasian Language Technology Association Workshop 2018

pdf bib
Proceedings of the Australasian Language Technology Association Workshop 2018
Sunghwan Mac Kim | Xiuzhen (Jenny) Zhang

pdf bib
Improved Neural Machine Translation using Side Information
Cong Duy Vu Hoang | Gholamreza Haffari | Trevor Cohn

In this work, we investigate whether side information is helpful in neural machine translation (NMT). We study various kinds of side information, including topical information, personal trait, then propose different ways of incorporating them into the existing NMT models. Our experimental results show the benefits of side information in improving the NMT models.

pdf bib
Development of Natural Language Processing Tools for Cook Islands MoriCook Islands Māori
Rolando Coto Solano | Sally Akevai Nicholas | Samantha Wray

This paper presents three ongoing projects for NLP in Cook Islands Maori : Untrained Forced Alignment (approx. 9 % error when detecting the center of words), speech-to-text (37 % WER in the best trained models) and POS tagging (92 % accuracy for the best performing model). Included as part of these projects are new resources filling in a gap in Australasian languages, including gold standard POS-tagged written corpora, transcribed speech corpora, time-aligned corpora down to the level of phonemes. These are part of efforts to accelerate the documentation of Cook Islands Maori and to increase its vitality amongst its users.

pdf bib
Specifying Conceptual Models Using Restricted Natural Language
Bayzid Ashik Hossain | Rolf Schwitter

The key activity to design an information system is conceptual modelling which brings out and describes the general knowledge that is required to build a system. In this paper we propose a novel approach to conceptual modelling where the domain experts will be able to specify and construct a model using a restricted form of natural language. A restricted natural language is a subset of a natural language that has well-defined computational properties and therefore can be translated unambiguously into a formal notation. We will argue that a restricted natural language is suitable for writing precise and consistent specifications that lead to executable conceptual models. Using a restricted natural language will allow the domain experts to describe a scenario in the terminology of the application domain without the need to formally encode this scenario. The resulting textual specification can then be automatically translated into the language of the desired conceptual modelling framework.

pdf bib
Cluster Labeling by Word Embeddings and WordNet’s HypernymyWordNet's Hypernymy
Hanieh Poostchi | Massimo Piccardi

Cluster labeling is the assignment of representative labels to clusters obtained from the organization of a document collection. Once assigned, the labels can play an important role in applications such as navigation, search and document classification. However, finding appropriately descriptive labels is still a challenging task. In this paper, we propose various approaches for assigning labels to word clusters by leveraging word embeddings and the synonymity and hypernymy relations in the WordNet lexical ontology. Experiments carried out using the WebAP document dataset have shown that one of the approaches stand out in the comparison and is capable of selecting labels that are reasonably aligned with those chosen by a pool of four human annotators.

pdf bib
A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions
Navnita Nandakumar | Bahar Salehi | Timothy Baldwin

In this paper, we perform a comparative evaluation of off-the-shelf embedding models over the task of compositionality prediction of multiword expressions(MWEs). Our experimental results suggest that character- and document-level models capture knowledge of MWE compositionality and are effective in modelling varying levels of compositionality, with the advantage over word-level models that they do not require token-level identification of MWEs in the training corpus.

pdf bib
Overview of the 2018 ALTA Shared Task : Classifying Patent ApplicationsALTA Shared Task: Classifying Patent Applications
Diego Mollá | Dilesha Seneviratne

We present an overview of the 2018 ALTA shared task. This is the 9th of the series of shared tasks organised by ALTA since 2010. The task was to classify Australian patent classifications following the sections defined by the International Patient Classification (IPC), using data made available by IP Australia. We introduce the task, describe the data and present the results of the participating teams. Some of the participating teams outperformed state of the art.

pdf bib
Classifying Patent Applications with Ensemble Methods
Fernando Benites | Shervin Malmasi | Marcos Zampieri

We present methods for the automatic classification of patent applications using an annotated dataset provided by the organizers of the ALTA 2018 shared task-Classifying Patent Applications. The goal of the task is to use computational methods to categorize patent applications according to a coarse-grained taxonomy of eight classes based on the International Patent Classification (IPC). We tested a variety of approaches for this task and the best results, 0.778 micro-averaged F1-Score, were achieved by SVM ensembles using a combination of words and characters as features. Our team, BMZ, was ranked first among 14 teams in the competition.

pdf bib
Universal Language Model Fine-tuning for Patent Classification
Jason Hepburn

This paper describes the methods used for the 2018 ALTA Shared Task. The task this year was to automatically classify Australian patents into their main International Patent Classification section. Our final submission used a Support Vector Machine (SVM) and Universal Language Model with Fine-tuning (ULMFiT). Our system achieved the best results in the student category.