Timothy Baldwin

Also published as: Tim Baldwin


2022

pdf bib
What does it take to bake a cake The RecipeRef corpus and anaphora resolution in procedural textRecipeRef corpus and anaphora resolution in procedural text
Biaoyan Fang | Timothy Baldwin | Karin Verspoor
Findings of the Association for Computational Linguistics: ACL 2022

Procedural text contains rich anaphoric phenomena yet has not received much attention in NLP To fill this gap we investigate the textual properties of two types of procedural text recipes and chemical patents and generalize an anaphora annotation framework developed for the chemical domain for modeling anaphoric phenomena in recipes We apply this framework to annotate the RecipeRef corpus with both bridging and coreference relations Through comparison to chemical patents we show the complexity of anaphora resolution in recipes We demonstrate empirically that transfer learning from the chemical domain improves resolution of anaphora in recipes suggesting transferability of general procedural knowledge

2021

pdf bib
Top-down Discourse Parsing via Sequence Labelling
Fajri Koto | Jey Han Lau | Timothy Baldwin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020 ; Zhang et al., 2020). By framing the task as a sequence labelling problem where the goal is to iteratively segment a document into individual discourse units, we are able to eliminate the decoder and reduce the search space for splitting points. We explore both traditional recurrent models and modern pre-trained transformer models for the task, and additionally introduce a novel dynamic oracle for top-down parsing. Based on the Full metric, our proposed LSTM model sets a new state-of-the-art for RST parsing.

pdf bib
ChEMU-Ref : A Corpus for Modeling Anaphora Resolution in the Chemical DomainChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain
Biaoyan Fang | Christian Druckenbrodt | Saber A Akhondi | Jiayuan He | Timothy Baldwin | Karin Verspoor
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Chemical patents contain rich coreference and bridging links, which are the target of this research. Specially, we introduce a novel annotation scheme, based on which we create the ChEMU-Ref dataset from reaction description snippets in English-language chemical patents. We propose a neural approach to anaphora resolution, which we show to achieve strong results, especially when jointly trained over coreference and bridging links.

pdf bib
Diverse Adversaries for Mitigating Bias in Training
Xudong Han | Timothy Baldwin | Trevor Cohn
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Adversarial learning can learn fairer and less biased models of language processing than standard training. However, current adversarial techniques only partially mitigate the problem of model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results show that our method substantially improves over standard adversarial removal methods, in terms of reducing bias and stability of training.

pdf bib
A Simple yet Effective Method for Sentence Ordering
Aili Shen | Timothy Baldwin
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Sentence ordering is the task of arranging a given bag of sentences so as to maximise the coherence of the overall text. In this work, we propose a simple yet effective training method that improves the capacity of models to capture overall text coherence based on training over pairs of sentences / segments. Experimental results show the superiority of our proposed method in in- and cross-domain settings. The utility of our method is also verified over a multi-document summarisation task.

pdf bib
Fairness-aware Class Imbalanced Learning
Shivashankar Subramanian | Afshin Rahimi | Timothy Baldwin | Trevor Cohn | Lea Frermann
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Class imbalance is a common challenge in many NLP tasks, and has clear connections to bias, in that bias in training data often leads to higher accuracy for majority groups at the expense of minority groups. However there has traditionally been a disconnect between research on class-imbalanced learning and mitigating bias, and only recently have the two been looked at through a common lens. In this work we evaluate long-tail learning methods for tweet sentiment and occupation classification, and extend a margin-loss based approach with methods to enforce fairness. We empirically show through controlled experiments that the proposed approaches help mitigate both class imbalance and demographic biases.

pdf bib
Semi-automatic Triage of Requests for Free Legal Assistance
Meladel Mistica | Jey Han Lau | Brayden Merrifield | Kate Fazio | Timothy Baldwin
Proceedings of the Natural Legal Language Processing Workshop 2021

Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet. A major bottleneck in the provision of free legal assistance to those most in need is the determination of the precise nature of the legal problem. This paper describes a collaboration with a major provider of free legal assistance, and the deployment of natural language processing models to assign area-of-law categories to real-world requests for legal assistance. In particular, we focus on an investigation of models to generate efficiencies in the triage process, but also the risks associated with naive use of model predictions, including fairness across different user demographics.

pdf bib
Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora
Takashi Wada | Tomoharu Iwata | Yuji Matsumoto | Timothy Baldwin | Jey Han Lau
Proceedings of the 1st Workshop on Multilingual Representation Learning

We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence. Through sharing model parameters among different languages, our model jointly trains the word embeddings in a common cross-lingual space. We also propose to combine word and subword embeddings to make use of orthographic similarities across different languages. We base our experiments on real-world data from endangered languages, namely Yongning Na, Shipibo-Konibo, and Griko. Our experiments on bilingual lexicon induction and word alignment tasks show that our model outperforms existing methods by a large margin for most language pairs. These results demonstrate that, contrary to common belief, an encoder-decoder translation model is beneficial for learning cross-lingual representations even in extremely low-resource conditions. Furthermore, our model also works well on high-resource conditions, achieving state-of-the-art performance on a German-English word-alignment task.

pdf bib
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

2020

pdf bib
Liputan6 : A Large-scale Indonesian Dataset for Text SummarizationIndonesian Dataset for Text Summarization
Fajri Koto | Jey Han Lau | Timothy Baldwin
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

In this paper, we introduce a large-scale Indonesian summarization dataset. We harvest articles from Liputan6.com, an online news portal, and obtain 215,827 documentsummary pairs. We leverage pre-trained language models to develop benchmark extractive and abstractive summarization methods over the dataset with multilingual and monolingual BERT-based models. We include a thorough error analysis by examining machine-generated summaries that have low ROUGE scores, and expose both issues with ROUGE itself, as well as with extractive and abstractive summarization models.

pdf bib
Target Word Masking for Location Metonymy Resolution
Haonan Li | Maria Vasardani | Martin Tomko | Timothy Baldwin
Proceedings of the 28th International Conference on Computational Linguistics

Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources. In this paper, we propose an end-to-end word-level classification approach based only on BERT, without dependencies on taggers, parsers, curated dictionaries of place names, or other external resources. We show that our approach achieves the state-of-the-art on 5 datasets, surpassing conventional BERT models and benchmarks by a large margin. We also show that our approach generalises well to unseen data.

pdf bib
WikiUMLS : Aligning UMLS to Wikipedia via Cross-lingual Neural RankingWikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking
Afshin Rahimi | Timothy Baldwin | Karin Verspoor
Proceedings of the 28th International Conference on Computational Linguistics

We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources. We propose a cross-lingual neural reranking model to match a UMLS concept with a Wikipedia page, which achieves a recall@1of 72 %, a substantial improvement of 20 % over word- and char-level BM25, enabling manual alignment with minimal effort. We release our resources, including ranked Wikipedia pages for 700k UMLSconcepts, and WikiUMLS, a dataset for training and evaluation of alignment models between UMLS and Wikipedia collected from Wikidata. This will provide easier access to Wikipedia for health professionals, patients, and NLP systems, including in multilingual settings.

pdf bib
Information Extraction from Legal Documents : A Study in the Context of Common Law Court Judgements
Meladel Mistica | Geordie Z. Zhang | Hui Chia | Kabir Manandhar Shrestha | Rohit Kumar Gupta | Saket Khandelwal | Jeannie Paterson | Timothy Baldwin | Daniel Beck
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association

‘Common Law’ judicial systems follow the doctrine of precedent, which means the legal principles articulated in court judgements are binding in subsequent cases in lower courts. For this reason, lawyers must search prior judgements for the legal principles that are relevant to their case. The difficulty for those within the legal profession is that the information that they are looking for may be contained within a few paragraphs or sentences, but those few paragraphs may be buried within a hundred-page document. In this study, we create a schema based on the relevant information that legal professionals seek within judgements and perform text classification based on it, with the aim of not only assisting lawyers in researching cases, but eventually enabling large-scale analysis of legal judgements to find trends in court outcomes over time.

pdf bib
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

2019

pdf bib
Deep Ordinal Regression for Pledge Specificity Prediction
Shivashankar Subramanian | Trevor Cohn | Timothy Baldwin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Many pledges are made in the course of an election campaign, forming important corpora for political analysis of campaign strategy and governmental accountability. At present, there are no publicly available annotated datasets of pledges, and most political analyses rely on manual annotations. In this paper we collate a novel dataset of manifestos from eleven Australian federal election cycles, with over 12,000 sentences annotated with specificity (e.g., rhetorical vs detailed pledge) on a fine-grained scale. We propose deep ordinal regression approaches for specificity prediction, under both supervised and semi-supervised settings, and provide empirical results demonstrating the effectiveness of the proposed techniques over several baseline approaches. We analyze the utility of pledge specificity modeling across a spectrum of policy issues in performing ideology prediction, and further provide qualitative analysis in terms of capturing party-specific issue salience across election cycles.

pdf bib
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

pdf bib
Modelling Uncertainty in Collaborative Document Quality Assessment
Aili Shen | Daniel Beck | Bahar Salehi | Jianzhong Qi | Timothy Baldwin
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task. To imitate people’s disagreement over inherently subjective tasks such as rating the quality of a Wikipedia article, a document quality assessment system should provide not only a prediction of the article quality but also the uncertainty over its predictions. This motivates us to measure the uncertainty in document quality predictions, in addition to making the label prediction. Experimental results show that both Gaussian processes (GPs) and random forests (RFs) can yield competitive results in predicting the quality of Wikipedia articles, while providing an estimate of uncertainty when there is inconsistency in the quality labels from the Wikipedia contributors. We additionally evaluate our methods in the context of a semi-automated document quality class assignment decision-making process, where there is asymmetric risk associated with overestimates and underestimates of document quality. Our experiments suggest that GPs provide more reliable estimates in this context.

pdf bib
Reevaluating Argument Component Extraction in Low Resource Settings
Anirudh Joshi | Timothy Baldwin | Richard Sinnott | Cecile Paris
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Argument component extraction is a challenging and complex high-level semantic extraction task. As such, it is both expensive to annotate (meaning training data is limited and low-resource by nature), and hard for current-generation deep learning methods to model. In this paper, we reevaluate the performance of state-of-the-art approaches in both single- and multi-task learning settings using combinations of character-level, GloVe, ELMo, and BERT encodings using standard BiLSTM-CRF encoders. We use evaluation metrics that are more consistent with evaluation practice in named entity recognition to understand how well current baselines address this challenge and compare their performance to lower-level semantic tasks such as CoNLL named entity recognition. We find that performance utilizing various pre-trained representations and training methodologies often leaves a lot to be desired as it currently stands, and suggest future pathways for improvement.

pdf bib
UniMelb at SemEval-2019 Task 12 : Multi-model combination for toponym resolutionUniMelb at SemEval-2019 Task 12: Multi-model combination for toponym resolution
Haonan Li | Minghan Wang | Timothy Baldwin | Martin Tomko | Maria Vasardani
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our submission to SemEval-2019 Task 12 on toponym resolution over scientific articles. We train separate NER models for toponym detection over text extracted from tables vs. text from the body of the paper, and train another auxiliary model to eliminate misdetected toponyms. For toponym disambiguation, we use an SVM classifier with hand-engineered features. The best setting achieved a strict micro-F1 score of 80.92 % and overlap micro-F1 score of 86.88 % in the toponym detection subtask, ranking 2nd out of 8 teams on F1 score. For toponym disambiguation and end-to-end resolution, we officially ranked 2nd and 3rd, respectively.

pdf bib
How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions
Navnita Nandakumar | Timothy Baldwin | Bahar Salehi
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.

pdf bib
Improved Document Modelling with a Neural Discourse Parser
Fajri Koto | Jey Han Lau | Timothy Baldwin
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

Despite the success of attention-based neural models for natural language generation and classification tasks, they are unable to capture the discourse structure of larger documents. We hypothesize that explicit discourse representations have utility for NLP tasks over longer documents or document sequences, which sequence-to-sequence models are unable to capture. For abstractive summarization, for instance, conventional neural models simply match source documents and the summary in a latent space without explicit representation of text structure or relations. In this paper, we propose to use neural discourse representations obtained from a rhetorical structure theory (RST) parser to enhance document representations. Specifically, document representations are generated for discourse spans, known as the elementary discourse units (EDUs). We empirically investigate the benefit of the proposed approach on two different tasks : abstractive summarization and popularity prediction of online petitions. We find that the proposed approach leads to substantial improvements in all cases.

pdf bib
Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLPLSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP
Gaurav Arora | Afshin Rahimi | Timothy Baldwin
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

Catastrophic forgetting whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a catastrophic drop in performance over the first task is a hurdle in the development of better transfer learning techniques. Despite impressive progress in reducing catastrophic forgetting, we have limited understanding of how different architectures and hyper-parameters affect forgetting in a network. With this study, we aim to understand factors which cause forgetting during sequential training. Our primary finding is that CNNs forget less than LSTMs. We show that max-pooling is the underlying operation which helps CNNs alleviate forgetting compared to LSTMs. We also found that curriculum learning, placing a hard task towards the end of task sequence, reduces forgetting. We analysed the effect of fine-tuning contextual embeddings on catastrophic forgetting and found that using embeddings as feature extractor is preferable to fine-tuning in continual learning setup.

pdf bib
Detecting Chemical Reactions in Patents
Hiyori Yoshikawa | Dat Quoc Nguyen | Zenan Zhai | Christian Druckenbrodt | Camilo Thorne | Saber A. Akhondi | Timothy Baldwin | Karin Verspoor
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration. In this paper we introduce the novel task of detecting the textual spans that describe or refer to chemical reactions within patents. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs which contain a description of a reaction. To address this new task, we construct an annotated dataset from an existing proprietary database of chemical reactions manually extracted from patents. We introduce several baseline methods for the task and evaluate them over our dataset. Through error analysis, we discuss what makes the task complex and challenging, and suggest possible directions for future research.

pdf bib
Contextualization of Morphological Inflection
Ekaterina Vylomova | Ryan Cotterell | Trevor Cohn | Timothy Baldwin | Jason Eisner
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Critical to natural language generation is the production of correctly inflected text. In this paper, we isolate the task of predicting a fully inflected sentence from its partially lemmatized version. Unlike traditional morphological inflection or surface realization, our task input does not provide gold tags that specify what morphological features to realize on each lemmatized word ; rather, such features must be inferred from sentential context. We develop a neural hybrid graphical model that explicitly reconstructs morphological features before predicting the inflected forms, and compare this to a system that directly predicts the inflected forms without relying on any morphological annotation. We experiment on several typologically diverse languages from the Universal Dependencies treebanks, showing the utility of incorporating linguistically-motivated latent variables into NLP models.

2018

pdf bib
The Company They Keep : Extracting Japanese Neologisms Using Language PatternsJapanese Neologisms Using Language Patterns
James Breen | Timothy Baldwin | Francis Bond
Proceedings of the 9th Global Wordnet Conference

We describe an investigation into the identification and extraction of unrecorded potential lexical items in Japanese text by detecting text passages containing selected language patterns typically associated with such items. We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter. Samples of the extracted items were evaluated, and it was demonstrated that the approach has considerable potential for identifying terms for later lexicographic analysis.

pdf bib
UniMelb at SemEval-2018 Task 12 : Generative Implication using LSTMs, Siamese Networks and Semantic Representations with Synonym FuzzingUniMelb at SemEval-2018 Task 12: Generative Implication using LSTMs, Siamese Networks and Semantic Representations with Synonym Fuzzing
Anirudh Joshi | Tim Baldwin | Richard O. Sinnott | Cecile Paris
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes a warrant classification system for SemEval 2018 Task 12, that attempts to learn semantic representations of reasons, claims and warrants. The system consists of 3 stacked LSTMs : one for the reason, one for the claim, and one shared Siamese Network for the 2 candidate warrants. Our main contribution is to force the embeddings into a shared feature space using vector operations, semantic similarity classification, Siamese networks, and multi-task learning. In doing so, we learn a form of generative implication, in encoding implication interrelationships between reasons, claims, and the associated correct and incorrect warrants. We augment the limited data in the task further by utilizing WordNet synonym fuzzing. When applied to SemEval 2018 Task 12, our system performs well on the development data, and officially ranked 8th among 21 teams.

pdf bib
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
Wei Xu | Alan Ritter | Tim Baldwin | Afshin Rahimi
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

pdf bib
Twitter Geolocation using Knowledge-Based MethodsTwitter Geolocation using Knowledge-Based Methods
Taro Miyazaki | Afshin Rahimi | Trevor Cohn | Timothy Baldwin
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

Automatic geolocation of microblog posts from their text content is particularly difficult because many location-indicative terms are rare terms, notably entity names such as locations, people or local organisations. Their low frequency means that key terms observed in testing are often unseen in training, such that standard classifiers are unable to learn weights for them. We propose a method for reasoning over such terms using a knowledge base, through exploiting their relations with other entities. Our technique uses a graph embedding over the knowledge base, which we couple with a text representation to learn a geolocation classifier, trained end-to-end. We show that our method improves over purely text-based methods, which we ascribe to more robust treatment of low-count and out-of-vocabulary entities.

pdf bib
Hierarchical Structured Model for Fine-to-Coarse Manifesto Text Analysis
Shivashankar Subramanian | Trevor Cohn | Timothy Baldwin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Election manifestos document the intentions, motives, and views of political parties. They are often used for analysing a party’s fine-grained position on a particular issue, as well as for coarse-grained positioning of a party on the leftright spectrum. In this paper we propose a two-stage model for automatically performing both levels of analysis over manifestos. In the first step we employ a hierarchical multi-task structured deep model to predict fine- and coarse-grained positions, and in the second step we perform post-hoc calibration of coarse-grained positions using probabilistic soft logic. We empirically show that the proposed model outperforms state-of-art approaches at both granularities using manifestos from twelve countries, written in ten different languages.

pdf bib
Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-Based Sentiment Analysis
Fei Liu | Trevor Cohn | Timothy Baldwin
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

While neural networks have been shown to achieve impressive results for sentence-level sentiment analysis, targeted aspect-based sentiment analysis (TABSA) extraction of fine-grained opinion polarity w.r.t. a pre-defined set of aspects remains a difficult task. Motivated by recent advances in memory-augmented models for machine reading, we propose a novel architecture, utilising external memory chains with a delayed memory update mechanism to track entities. On a TABSA task, the proposed model demonstrates substantial improvements over state-of-the-art approaches, including those using external knowledge bases.

pdf bib
A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions
Navnita Nandakumar | Bahar Salehi | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2018

In this paper, we perform a comparative evaluation of off-the-shelf embedding models over the task of compositionality prediction of multiword expressions(MWEs). Our experimental results suggest that character- and document-level models capture knowledge of MWE compositionality and are effective in modelling varying levels of compositionality, with the advantage over word-level models that they do not require token-level identification of MWEs in the training corpus.

pdf bib
Semi-supervised User Geolocation via Graph Convolutional Networks
Afshin Rahimi | Trevor Cohn | Timothy Baldwin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Social media user geolocation is vital to many applications such as event detection. In this paper, we propose GCN, a multiview geolocation model based on Graph Convolutional Networks, that uses both text and network context. We compare GCN to the state-of-the-art, and to two baselines we propose, and show that our model achieves or is competitive with the state-of-the-art over three benchmark geolocation datasets when sufficient supervision is available. We also evaluate GCN under a minimal supervision scenario, and show it outperforms baselines. We find that highway network gates are essential for controlling the amount of useful neighbourhood expansion in GCN.

pdf bib
Narrative Modeling with Memory Chains and Semantic Supervision
Fei Liu | Trevor Cohn | Timothy Baldwin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Story comprehension requires a deep semantic understanding of the narrative, making it a challenging task. Inspired by previous studies on ROC Story Cloze Test, we propose a novel method, tracking various semantic aspects with external neural memory chains while encouraging each to focus on a particular semantic aspect. Evaluated on the task of story ending prediction, our model demonstrates superior performance to a collection of competitive baselines, setting a new state of the art.

2017

pdf bib
Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields
Fei Liu | Timothy Baldwin | Trevor Cohn
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Despite successful applications across a broad range of NLP tasks, conditional random fields (CRFs), in particular the linear-chain variant, are only able to model local features. While this has important benefits in terms of inference tractability, it limits the ability of the model to capture long-range dependencies between items. Attempts to extend CRFs to capture long-range dependencies have largely come at the cost of computational complexity and approximate inference. In this work, we propose an extension to CRFs by integrating external memory, taking inspiration from memory networks, thereby allowing CRFs to incorporate information far beyond neighbouring steps. Experiments across two tasks show substantial improvements over strong CRF and LSTM baselines.

pdf bib
An Automatic Approach for Document-level Topic Model Evaluation
Shraey Bhatia | Jey Han Lau | Timothy Baldwin
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Topic models jointly learn topics and document-level topic distribution. Extrinsic evaluation of topic models tends to focus exclusively on topic-level evaluation, e.g. by assessing the coherence of topics. We demonstrate that there can be large discrepancies between topic- and document-level model quality, and that basing model evaluation on topic-level analysis can be highly misleading. We propose a method for automatically predicting topic model quality based on analysis of document-level topic allocations, and provide empirical evidence for its robustness.

pdf bib
Topically Driven Neural Language Model
Jey Han Lau | Timothy Baldwin | Trevor Cohn
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics.

pdf bib
A Neural Model for User Geolocation and Lexical Dialectology
Afshin Rahimi | Trevor Cohn | Timothy Baldwin
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We propose a simple yet effective text-based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms. As part of our analysis of dialectal terms, we release DAREDS, a dataset for evaluating dialect term detection methods.

pdf bib
Decoupling Encoder and Decoder Networks for Abstractive Document Summarization
Ying Xu | Jey Han Lau | Timothy Baldwin | Trevor Cohn
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

Abstractive document summarization seeks to automatically generate a summary for a document, based on some abstract understanding of the original document. State-of-the-art techniques traditionally use attentive encoderdecoder architectures. However, due to the large number of parameters in these models, they require large training datasets and long training times. In this paper, we propose decoupling the encoder and decoder networks, and training them separately. We encode documents using an unsupervised document encoder, and then feed the document vector to a recurrent neural network decoder. With this decoupled architecture, we decrease the number of parameters in the decoder substantially, and shorten its training time. Experiments show that the decoupled model achieves comparable performance with state-of-the-art models for in-domain documents, but less well for out-of-domain documents.

pdf bib
Semi-Automated Resolution of Inconsistency for a Harmonized Multiword Expression and Dependency Parse Annotation
King Chan | Julian Brooke | Timothy Baldwin
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations. Candidates for correction are identified using a variety of heuristics, including an entirely novel one which identifies violations of MWE constituency in the dependency tree, and resolved by arbitration with minimal human intervention. Using this technique, we identified and corrected several hundred errors across both parse and MWE annotations, representing changes to a significant percentage (well over 10 %) of the MWE instances in the joint corpus.

pdf bib
Sub-character Neural Language Modelling in JapaneseJapanese
Viet Nguyen | Julian Brooke | Timothy Baldwin
Proceedings of the First Workshop on Subword and Character Level Models in NLP

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using sub-characters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of sub-characters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.

pdf bib
Proceedings of the 3rd Workshop on Noisy User-generated Text
Leon Derczynski | Wei Xu | Alan Ritter | Tim Baldwin
Proceedings of the 3rd Workshop on Noisy User-generated Text

pdf bib
BIBI System Description : Building with CNNs and Breaking with Deep Reinforcement LearningBIBI System Description: Building with CNNs and Breaking with Deep Reinforcement Learning
Yitong Li | Trevor Cohn | Timothy Baldwin
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper describes our submission to the sentiment analysis sub-task of Build It, Break It : The Language Edition (BIBI), on both the builder and breaker sides. As a builder, we use convolutional neural nets, trained on both phrase and sentence data. As a breaker, we use Q-learning to learn minimal change pairs, and apply a token substitution method automatically. We analyse the results to gauge the robustness of NLP systems.

pdf bib
Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice
Julian Brooke | Jan Šnajder | Timothy Baldwin
Transactions of the Association for Computational Linguistics, Volume 5

We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates. In contrast to the standard approach of simple ranking by association measure, in our model n-grams are arranged in a lattice structure based on subsumption and overlap relationships, with nodes inhibiting other nodes in their vicinity when they are selected as a lexical item. We show how the configuration of such a lattice can be optimized tractably, and demonstrate using annotations of sampled n-grams that our method consistently outperforms alternatives by at least 0.05 F-score across several corpora and languages.

pdf bib
SemEval-2017 Task 3 : Community Question AnsweringSemEval-2017 Task 3: Community Question Answering
Preslav Nakov | Doris Hoogeveen | Lluís Màrquez | Alessandro Moschitti | Hamdy Mubarak | Timothy Baldwin | Karin Verspoor
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe SemEval2017 Task 3 on Community Question Answering. This year, we reran the four subtasks from SemEval-2016 : (A) QuestionComment Similarity, (B) QuestionQuestion Similarity, (C) QuestionExternal Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 2015 and 2016 for training, and fresh data for testing. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums. A total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks AD. Unfortunately, no teams participated in subtask E. A variety of approaches and features were used by the participating systems to address the different subtasks. The best systems achieved an official score (MAP) of 88.43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. These scores are better than the baselines, especially for subtasks AC.

pdf bib
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
Afshin Rahimi | Timothy Baldwin | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.

pdf bib
Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation
Qingsong Ma | Yvette Graham | Timothy Baldwin | Qun Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators. Monolingual evaluation runs the risk, however, of bias in favour of MT systems that happen to produce translations superficially similar to the reference and, consistent with this intuition, previous investigations have concluded monolingual assessment to be strongly biased in this respect. On re-examination of past analyses, we identify a series of potential analytical errors that force some important questions to be raised about the reliability of past conclusions, however. We subsequently carry out further investigation into reference bias via direct human assessment of MT adequacy via quality controlled crowd-sourcing. Contrary to both intuition and past conclusions, results for show no significant evidence of reference bias in monolingual evaluation of MT.

pdf bib
Sequence Effects in Crowdsourced Annotations
Nitika Mathur | Timothy Baldwin | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Manual data annotation is a vital component of NLP research. When designing annotation tasks, properties of the annotation interface can unintentionally lead to artefacts in the resulting dataset, biasing the evaluation. In this paper, we explore sequence effects where annotations of an item are affected by the preceding items. Having assigned one label to an instance, the annotator may be less (or more) likely to assign the same label to the next. During rating tasks, seeing a low quality item may affect the score given to the next item either positively or negatively. We see clear evidence of both types of effects using auto-correlation studies over three different crowdsourced datasets. We then recommend a simple way to minimise sequence effects.

pdf bib
Context-Aware Prediction of Derivational Word-forms
Ekaterina Vylomova | Ryan Cotterell | Timothy Baldwin | Trevor Cohn
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Derivational morphology is a fundamental and complex characteristic of language. In this paper we propose a new task of predicting the derivational form of a given base-form lemma that is appropriate for a given context. We present an encoder-decoder style neural network to produce a derived form character-by-character, based on its corresponding character-level representation of the base form and the context. We demonstrate that our model is able to generate valid context-sensitive derivations from known base forms, but is less accurate under lexicon agnostic setting.

pdf bib
Improving Evaluation of Document-level Machine Translation Quality Estimation
Yvette Graham | Qingsong Ma | Timothy Baldwin | Qun Liu | Carla Parra | Carolina Scarton
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable. In this paper, we explore the validity of human annotations currently employed in the evaluation of document-level quality estimation for machine translation (MT). We demonstrate the degree to which MT system rankings are dependent on weights employed in the construction of the gold standard, before proposing direct human assessment as a valid alternative. Experiments show direct assessment (DA) scores for documents to be highly reliable, achieving a correlation of above 0.9 in a self-replication experiment, in addition to a substantial estimated cost reduction through quality controlled crowd-sourcing. The original gold standard based on post-edits incurs a 1020 times greater cost than DA.

pdf bib
Multimodal Topic Labelling
Ionut Sorodoc | Jey Han Lau | Nikolaos Aletras | Timothy Baldwin
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Topics generated by topic models are typically presented as a list of topic terms. Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics. Traditionally, topic label systems focus on a single label modality, e.g. textual labels. In this work we propose a multimodal approach to topic labelling using a simple feedforward neural network. Given a topic and a candidate image or textual label, our method automatically generates a rating for the label, relative to the topic. Experiments show that this multimodal approach outperforms single-modality topic labelling systems.