Trevor Cohn

University of Melbourne

Other people with similar names: Trevor Cohen (University of Washington)


2021

pdf bib
Commonsense Knowledge in Word Associations and ConceptNetConceptNet
Chunhua Liu | Trevor Cohn | Lea Frermann
Proceedings of the 25th Conference on Computational Natural Language Learning

Humans use countless basic, shared facts about the world to efficiently navigate in their environment. This commonsense knowledge is rarely communicated explicitly, however, understanding how commonsense knowledge is represented in different paradigms is important for (a) a deeper understanding of human cognition and (b) augmenting automatic reasoning systems. This paper presents an in-depth comparison of two large-scale resources of general knowledge : ConceptNet, an engineered relational database, and SWOW, a knowledge graph derived from crowd-sourced word associations. We examine the structure, overlap and differences between the two graphs, as well as the extent of situational commonsense knowledge present in the two resources. We finally show empirically that both resources improve downstream task performance on commonsense reasoning benchmarks over text-only baselines, suggesting that large-scale word association data, which have been obtained for several languages through crowd-sourcing, can be a valuable complement to curated knowledge graphs.

pdf bib
Diverse Adversaries for Mitigating Bias in Training
Xudong Han | Timothy Baldwin | Trevor Cohn
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Adversarial learning can learn fairer and less biased models of language processing than standard training. However, current adversarial techniques only partially mitigate the problem of model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results show that our method substantially improves over standard adversarial removal methods, in terms of reducing bias and stability of training.

pdf bib
Fairness-aware Class Imbalanced Learning
Shivashankar Subramanian | Afshin Rahimi | Timothy Baldwin | Trevor Cohn | Lea Frermann
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Class imbalance is a common challenge in many NLP tasks, and has clear connections to bias, in that bias in training data often leads to higher accuracy for majority groups at the expense of minority groups. However there has traditionally been a disconnect between research on class-imbalanced learning and mitigating bias, and only recently have the two been looked at through a common lens. In this work we evaluate long-tail learning methods for tweet sentiment and occupation classification, and extend a margin-loss based approach with methods to enforce fairness. We empirically show through controlled experiments that the proposed approaches help mitigate both class imbalance and demographic biases.

pdf bib
Document Level Hierarchical Transformer
Najam Zaidi | Trevor Cohn | Gholamreza Haffari
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

Generating long and coherent text is an important and challenging task encompassing many application areas such as summarization, document level machine translation and story generation. Despite the success in modeling intra-sentence coherence, existing long text generation models (e.g., BART and GPT-3) still struggle to maintain a coherent event sequence throughout the generated text. We conjecture that this is because of the difficulty for the model to revise, replace, revoke or delete any part that has been generated by the model. In this paper, we present a novel semi-autoregressive document generation model capable of revising and editing the generated text. Building on recent models by (Gu et al., 2019 ; Xu and Carpuat, 2020) we propose document generation as a hierarchical Markov decision process with a two level hierarchy, where the high and low level editing programs. We train our model using imitation learning (Hussein et al., 2017) and introduce roll-in policy such that each policy learns on the output of applying the previous action. Experiments applying the proposed approach sheds various insights on the problems of long text generation using our model. We suggest various remedies such as using distilled dataset, designing better attention mechanisms and using autoregressive models as a low level program.

pdf bib
PTST-UoM at SemEval-2021 Task 10 : Parsimonious Transfer for Sequence TaggingPTST-UoM at SemEval-2021 Task 10: Parsimonious Transfer for Sequence Tagging
Kemal Kurniawan | Lea Frermann | Philip Schulz | Trevor Cohn
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes PTST, a source-free unsupervised domain adaptation technique for sequence tagging, and its application to the SemEval-2021 Task 10 on time expression recognition. PTST is an extension of the cross-lingual parsimonious parser transfer framework, which uses high-probability predictions of the source model as a supervision signal in self-training. We extend the framework to a sequence prediction setting, and demonstrate its applicability to unsupervised domain adaptation. PTST achieves F1 score of 79.6 % on the official test set, with the precision of 90.1 %, the highest out of 14 submissions.

2020

pdf bib
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Bonnie Webber | Trevor Cohn | Yulan He | Yang Liu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Findings of the Association for Computational Linguistics: EMNLP 2020
Trevor Cohn | Yulan He | Yang Liu
Findings of the Association for Computational Linguistics: EMNLP 2020

2019

pdf bib
Deep Ordinal Regression for Pledge Specificity Prediction
Shivashankar Subramanian | Trevor Cohn | Timothy Baldwin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Many pledges are made in the course of an election campaign, forming important corpora for political analysis of campaign strategy and governmental accountability. At present, there are no publicly available annotated datasets of pledges, and most political analyses rely on manual annotations. In this paper we collate a novel dataset of manifestos from eleven Australian federal election cycles, with over 12,000 sentences annotated with specificity (e.g., rhetorical vs detailed pledge) on a fine-grained scale. We propose deep ordinal regression approaches for specificity prediction, under both supervised and semi-supervised settings, and provide empirical results demonstrating the effectiveness of the proposed techniques over several baseline approaches. We analyze the utility of pledge specificity modeling across a spectrum of policy issues in performing ideology prediction, and further provide qualitative analysis in terms of capturing party-specific issue salience across election cycles.

pdf bib
Neural Speech Translation using Lattice Transformations and Graph Networks
Daniel Beck | Trevor Cohn | Gholamreza Haffari
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

Speech translation systems usually follow a pipeline approach, using word lattices as an intermediate representation. However, previous work assume access to the original transcriptions used to train the ASR system, which can limit applicability in real scenarios. In this work we propose an approach for speech translation through lattice transformations and neural models based on graph networks. Experimental results show that our approach reaches competitive performance without relying on transcriptions, while also being orders of magnitude faster than previous work.

pdf bib
On the Role of Scene Graphs in Image Captioning
Dalin Wang | Daniel Beck | Trevor Cohn
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

Scene graphs represent semantic information in images, which can help image captioning system to produce more descriptive outputs versus using only the image as context. Recent captioning approaches rely on ad-hoc approaches to obtain graphs for images. However, those graphs introduce noise and it is unclear the effect of parser errors on captioning accuracy. In this work, we investigate to what extent scene graphs can help image captioning. Our results show that a state-of-the-art scene graph parser can boost performance almost as much as the ground truth graphs, showing that the bottleneck currently resides more on the captioning models than on the performance of the scene graph parser.

pdf bib
Contextualization of Morphological Inflection
Ekaterina Vylomova | Ryan Cotterell | Trevor Cohn | Timothy Baldwin | Jason Eisner
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Critical to natural language generation is the production of correctly inflected text. In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version. Unlike traditional morphological inflection or surface realization, our task input does not provide gold tags that specify what morphological features to realize on each lemmatized word ; rather, such features must be inferred from sentential context. We develop a neural hybrid graphical model that explicitly reconstructs morphological features before predicting the inflected forms, and compare this to a system that directly predicts the inflected forms without relying on any morphological annotation. We experiment on several typologically diverse languages from the Universal Dependencies treebanks, showing the utility of incorporating linguistically-motivated latent variables into NLP models.

2018

pdf bib
Evaluating the Utility of Hand-crafted Features in Sequence Labelling
Minghao Wu | Fei Liu | Trevor Cohn
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a F 1 of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60 %, while retaining the same predictive accuracy.

pdf bib
Twitter Geolocation using Knowledge-Based MethodsTwitter Geolocation using Knowledge-Based Methods
Taro Miyazaki | Afshin Rahimi | Trevor Cohn | Timothy Baldwin
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

Automatic geolocation of microblog posts from their text content is particularly difficult because many location-indicative terms are rare terms, notably entity names such as locations, people or local organisations. Their low frequency means that key terms observed in testing are often unseen in training, such that standard classifiers are unable to learn weights for them. We propose a method for reasoning over such terms using a knowledge base, through exploiting their relations with other entities. Our technique uses a graph embedding over the knowledge base, which we couple with a text representation to learn a geolocation classifier, trained end-to-end. We show that our method improves over purely text-based methods, which we ascribe to more robust treatment of low-count and out-of-vocabulary entities.

pdf bib
Hierarchical Structured Model for Fine-to-Coarse Manifesto Text Analysis
Shivashankar Subramanian | Trevor Cohn | Timothy Baldwin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Election manifestos document the intentions, motives, and views of political parties. They are often used for analysing a party’s fine-grained position on a particular issue, as well as for coarse-grained positioning of a party on the leftright spectrum. In this paper we propose a two-stage model for automatically performing both levels of analysis over manifestos. In the first step we employ a hierarchical multi-task structured deep model to predict fine- and coarse-grained positions, and in the second step we perform post-hoc calibration of coarse-grained positions using probabilistic soft logic. We empirically show that the proposed model outperforms state-of-art approaches at both granularities using manifestos from twelve countries, written in ten different languages.

pdf bib
Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-Based Sentiment Analysis
Fei Liu | Trevor Cohn | Timothy Baldwin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

While neural networks have been shown to achieve impressive results for sentence-level sentiment analysis, targeted aspect-based sentiment analysis (TABSA) extraction of fine-grained opinion polarity w.r.t. a pre-defined set of aspects remains a difficult task. Motivated by recent advances in memory-augmented models for machine reading, we propose a novel architecture, utilising external memory chains with a delayed memory update mechanism to track entities. On a TABSA task, the proposed model demonstrates substantial improvements over state-of-the-art approaches, including those using external knowledge bases.

pdf bib
Improved Neural Machine Translation using Side Information
Cong Duy Vu Hoang | Gholamreza Haffari | Trevor Cohn
Proceedings of the Australasian Language Technology Association Workshop 2018

In this work, we investigate whether side information is helpful in neural machine translation (NMT). We study various kinds of side information, including topical information, personal trait, then propose different ways of incorporating them into the existing NMT models. Our experimental results show the benefits of side information in improving the NMT models.

pdf bib
Semi-supervised User Geolocation via Graph Convolutional Networks
Afshin Rahimi | Trevor Cohn | Timothy Baldwin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Social media user geolocation is vital to many applications such as event detection. In this paper, we propose GCN, a multiview geolocation model based on Graph Convolutional Networks, that uses both text and network context. We compare GCN to the state-of-the-art, and to two baselines we propose, and show that our model achieves or is competitive with the state-of-the-art over three benchmark geolocation datasets when sufficient supervision is available. We also evaluate GCN under a minimal supervision scenario, and show it outperforms baselines. We find that highway network gates are essential for controlling the amount of useful neighbourhood expansion in GCN.

pdf bib
Narrative Modeling with Memory Chains and Semantic Supervision
Fei Liu | Trevor Cohn | Timothy Baldwin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Story comprehension requires a deep semantic understanding of the narrative, making it a challenging task. Inspired by previous studies on ROC Story Cloze Test, we propose a novel method, tracking various semantic aspects with external neural memory chains while encouraging each to focus on a particular semantic aspect. Evaluated on the task of story ending prediction, our model demonstrates superior performance to a collection of competitive baselines, setting a new state of the art.

2017

pdf bib
Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields
Fei Liu | Timothy Baldwin | Trevor Cohn
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Despite successful applications across a broad range of NLP tasks, conditional random fields (CRFs), in particular the linear-chain variant, are only able to model local features. While this has important benefits in terms of inference tractability, it limits the ability of the model to capture long-range dependencies between items. Attempts to extend CRFs to capture long-range dependencies have largely come at the cost of computational complexity and approximate inference. In this work, we propose an extension to CRFs by integrating external memory, taking inspiration from memory networks, thereby allowing CRFs to incorporate information far beyond neighbouring steps. Experiments across two tasks show substantial improvements over strong CRF and LSTM baselines.

pdf bib
Learning Kernels over Strings using Gaussian ProcessesGaussian Processes
Daniel Beck | Trevor Cohn
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Non-contiguous word sequences are widely known to be important in modelling natural language. However they not explicitly encoded in common text representations. In this work we propose a model for text processing using string kernels, capable of flexibly representing non-contiguous sequences. Specifically, we derive a vectorised version of the string kernel algorithm and their gradients, allowing efficient hyperparameter optimisation as part of a Gaussian Process framework. Experiments on synthetic data and text regression for emotion analysis show the promise of this technique.

pdf bib
Topically Driven Neural Language Model
Jey Han Lau | Timothy Baldwin | Trevor Cohn
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics.

pdf bib
A Neural Model for User Geolocation and Lexical Dialectology
Afshin Rahimi | Trevor Cohn | Timothy Baldwin
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a simple yet effective text-based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms. As part of our analysis of dialectal terms, we release DAREDS, a dataset for evaluating dialect term detection methods.

pdf bib
Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary
Meng Fang | Trevor Cohn
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Cross-lingual model transfer is a compelling and popular method for predicting annotations in a low-resource language, whereby parallel corpora provide a bridge to a high-resource language, and its associated annotated corpora. However, parallel data is not readily available for many languages, limiting the applicability of these approaches. We address these drawbacks in our framework which takes advantage of cross-lingual word embeddings trained solely on a high coverage dictionary. We propose a novel neural network model for joint training from both sources of data based on cross-lingual word embeddings, and show substantial empirical improvements over baseline techniques. We also propose several active learning heuristics, which result in improvements over competitive benchmark methods.

pdf bib
Decoupling Encoder and Decoder Networks for Abstractive Document Summarization
Ying Xu | Jey Han Lau | Timothy Baldwin | Trevor Cohn
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

Abstractive document summarization seeks to automatically generate a summary for a document, based on some abstract understanding of the original document. State-of-the-art techniques traditionally use attentive encoderdecoder architectures. However, due to the large number of parameters in these models, they require large training datasets and long training times. In this paper, we propose decoupling the encoder and decoder networks, and training them separately. We encode documents using an unsupervised document encoder, and then feed the document vector to a recurrent neural network decoder. With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time. Experiments show that the decoupled model achieves comparable performance with state-of-the-art models for in-domain documents, but less well for out-of-domain documents.

pdf bib
Word Representation Models for Morphologically Rich Languages in Neural Machine Translation
Ekaterina Vylomova | Trevor Cohn | Xuanli He | Gholamreza Haffari
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Out-of-vocabulary words present a great challenge for Machine Translation. Recently various character-level compositional models were proposed to address this issue. In current research we incorporate two most popular neural architectures, namely LSTM and CNN, into hard- and soft-attentional models of translation for character-level representation of the source. We propose semantic and morphological intrinsic evaluation of encoder-level representations. Our analysis of the learned representations reveals that character-based LSTM seems to be better at capturing morphological aspects compared to character-based CNN. We also show that hard-attentional model provides better character-level representations compared to vanilla one.

pdf bib
BIBI System Description : Building with CNNs and Breaking with Deep Reinforcement LearningBIBI System Description: Building with CNNs and Breaking with Deep Reinforcement Learning
Yitong Li | Trevor Cohn | Timothy Baldwin
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper describes our submission to the sentiment analysis sub-task of Build It, Break It : The Language Edition (BIBI), on both the builder and breaker sides. As a builder, we use convolutional neural nets, trained on both phrase and sentence data. As a breaker, we use Q-learning to learn minimal change pairs, and apply a token substitution method automatically. We analyse the results to gauge the robustness of NLP systems.

pdf bib
Towards Decoding as Continuous Optimisation in Neural Machine Translation
Cong Duy Vu Hoang | Gholamreza Haffari | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a novel decoding approach for neural machine translation (NMT) based on continuous optimisation. We reformulate decoding, a discrete optimization problem, into a continuous problem, such that optimization can make use of efficient gradient-based techniques. Our powerful decoding framework allows for more accurate decoding for standard neural machine translation models, as well as enabling decoding in intractable models such as intersection of several different NMT models. Our empirical results show that our decoding framework is effective, and can leads to substantial improvements in translations, especially in situations where greedy search and beam search are not feasible. Finally, we show how the technique is highly competitive with, and complementary to, reranking.

pdf bib
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
Afshin Rahimi | Timothy Baldwin | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.

pdf bib
Learning how to Active Learn : A Deep Reinforcement Learning Approach
Meng Fang | Yuan Li | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Active learning aims to select a small subset of data for annotation such that a classifier learned on the data is highly accurate. This is usually done using heuristic selection methods, however the effectiveness of such methods is limited and moreover, the performance of heuristics varies between datasets. To address these shortcomings, we introduce a novel formulation by reframing the active learning as a reinforcement learning problem and explicitly learning a data selection policy, where the policy takes the role of the active learning heuristic. Importantly, our method allows the selection policy learned using simulation to one language to be transferred to other languages. We demonstrate our method using cross-lingual named entity recognition, observing uniform improvements over traditional active learning algorithms.

pdf bib
Sequence Effects in Crowdsourced Annotations
Nitika Mathur | Timothy Baldwin | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Manual data annotation is a vital component of NLP research. When designing annotation tasks, properties of the annotation interface can unintentionally lead to artefacts in the resulting dataset, biasing the evaluation. In this paper, we explore sequence effects where annotations of an item are affected by the preceding items. Having assigned one label to an instance, the annotator may be less (or more) likely to assign the same label to the next. During rating tasks, seeing a low quality item may affect the score given to the next item either positively or negatively. We see clear evidence of both types of effects using auto-correlation studies over three different crowdsourced datasets. We then recommend a simple way to minimise sequence effects.

pdf bib
Multilingual Training of Crosslingual Word Embeddings
Long Duong | Hiroshi Kanayama | Tengfei Ma | Steven Bird | Trevor Cohn
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Crosslingual word embeddings represent lexical items from different languages using the same vector space, enabling crosslingual transfer. Most prior work constructs embeddings for a pair of languages, with English on one side. We investigate methods for building high quality crosslingual word embeddings for many languages in a unified vector space. In this way, we can exploit and combine strength of many languages. We obtained high performance on bilingual lexicon induction, monolingual similarity and crosslingual document classification tasks.

pdf bib
Cross-Lingual Word Embeddings for Low-Resource Language Modeling
Oliver Adams | Adam Makarucha | Graham Neubig | Steven Bird | Trevor Cohn
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Most languages have no established writing system and minimal written records. However, textual data is essential for natural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fundamental task of documentary linguistics. We investigate the use of such lexicons to improve language models when textual training data is limited to as few as a thousand sentences. The method involves learning cross-lingual word embeddings as a preliminary step in training monolingual language models. Results across a number of languages show that language models are improved by this pre-training. Application to Yongning Na, a threatened language, highlights challenges in deploying the approach in real low-resource environments.

pdf bib
Context-Aware Prediction of Derivational Word-forms
Ekaterina Vylomova | Ryan Cotterell | Timothy Baldwin | Trevor Cohn
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Derivational morphology is a fundamental and complex characteristic of language. In this paper we propose a new task of predicting the derivational form of a given base-form lemma that is appropriate for a given context. We present an encoder-decoder style neural network to produce a derived form character-by-character, based on its corresponding character-level representation of the base form and the context. We demonstrate that our model is able to generate valid context-sensitive derivations from known base forms, but is less accurate under lexicon agnostic setting.