Proceedings of the Fourth Workshop on Structured Prediction for NLP

Priyanka Agrawal, Zornitsa Kozareva, Julia Kreutzer, Gerasimos Lampouras, André Martins, Sujith Ravi, Andreas Vlachos (Editors)

Anthology ID:
EMNLP | spnlp
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Fourth Workshop on Structured Prediction for NLP
Priyanka Agrawal | Zornitsa Kozareva | Julia Kreutzer | Gerasimos Lampouras | André Martins | Sujith Ravi | Andreas Vlachos

pdf bib
CopyNext : Explicit Span Copying and Alignment in Sequence to Sequence ModelsCopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models
Abhinav Singh | Patrick Xia | Guanghui Qin | Mahsa Yarmohammadi | Benjamin Van Durme

Copy mechanisms are employed in sequence to sequence (seq2seq) models to generate reproductions of words from the input to the output. These frameworks, operating at the lexical type level, fail to provide an explicit alignment that records where each token was copied from. Further, they require contiguous token sequences from the input (spans) to be copied individually. We present a model with an explicit token-level copy operation and extend it to copying entire spans. Our model provides hard alignments between spans in the input and output, allowing for nontraditional applications of seq2seq, like information extraction. We demonstrate the approach on Nested Named Entity Recognition, achieving near state-of-the-art accuracy with an order of magnitude increase in decoding speed.

pdf bib
Energy-based Neural Modelling for Large-Scale Multiple Domain Dialogue State Tracking
Anh Duong Trinh | Robert J. Ross | John D. Kelleher

Scaling up dialogue state tracking to multiple domains is challenging due to the growth in the number of variables being tracked. Furthermore, dialog state tracking models do not yet explicitly make use of relationships between dialogue variables, such as slots across domains. We propose using energy-based structure prediction methods for large-scale dialogue state tracking task in two multiple domain dialogue datasets. Our results indicate that : (i) modelling variable dependencies yields better results ; and (ii) the structured prediction output aligns with the dialogue slot-value constraint principles. This leads to promising directions to improve state-of-the-art models by incorporating variable dependencies into their prediction process.

pdf bib
Layer-wise Guided Training for BERT : Learning Incrementally Refined Document RepresentationsBERT: Learning Incrementally Refined Document Representations
Nikolaos Manginas | Ilias Chalkidis | Prodromos Malakasiotis

Although BERT is widely used by the NLP community, little is known about its inner workings. Several attempts have been made to shed light on certain aspects of BERT, often with contradicting conclusions. A much raised concern focuses on BERT’s over-parameterization and under-utilization issues. To this end, we propose o novel approach to fine-tune BERT in a structured manner. Specifically, we focus on Large Scale Multilabel Text Classification (LMTC) where documents are assigned with one or more labels from a large predefined set of hierarchically organized labels. Our approach guides specific BERT layers to predict labels from specific hierarchy levels. Experimenting with two LMTC datasets we show that this structured fine-tuning approach not only yields better classification results but also leads to better parameter utilization.

pdf bib
On the Discrepancy between Density Estimation and Sequence Generation
Jason Lee | Dustin Tran | Orhan Firat | Kyunghyun Cho

Many sequence-to-sequence generation tasks, including machine translation and text-to-speech, can be posed as estimating the density of the output y given the input x : p(y|x). Given this interpretation, it is natural to evaluate sequence-to-sequence models using conditional log-likelihood on a test set. However, the goal of sequence-to-sequence generation (or structured prediction) is to find the best output y given an input x, and each task has its own downstream metric R that scores a model output by comparing against a set of references y * : R(y, y * | x). While we hope that a model that excels in density estimation also performs well on the downstream metric, the exact correlation has not been studied for sequence generation tasks. In this paper, by comparing several density estimators on five machine translation tasks, we find that the correlation between rankings of models based on log-likelihood and BLEU varies significantly depending on the range of the model families being compared. First, log-likelihood is highly correlated with BLEU when we consider models within the same family (e.g. autoregressive models, or latent variable models with the same parameterization of the prior).

pdf bib
Deeply Embedded Knowledge Representation & Reasoning For Natural Language Question Answering : A Practitioner’s Perspective
Arindam Mitra | Sanjay Narayana | Chitta Baral

Successful application of Knowledge Representation and Reasoning (KR) in Natural Language Understanding (NLU) is largely limited by the availability of a robust and general purpose natural language parser. Even though several projects have been launched in the pursuit of developing a universal meaning representation language, the existence of an accurate universal parser is far from reality. This has severely limited the application of knowledge representation and reasoning (KR) in the field of NLP and also prevented a proper evaluation of KR based NLU systems. Our goal is to build KR based systems for Natural Language Understanding without relying on a parser. Towards this we propose a method named Deeply Embedded Knowledge Representation & Reasoning (DeepEKR) where we replace the parser by a neural network, soften the symbolic representation so that a deterministic mapping exists between the parser neural network and the interpretable logical form, and finally replace the symbolic solver by an equivalent neural network, so the model can be trained end-to-end. We evaluate our method with respect to the task of Qualitative Word Problem Solving on the two available datasets (QuaRTz and QuaRel). Our system achieves same accuracy as that of the state-of-the-art accuracy on QuaRTz, outperforms the state-of-the-art on QuaRel and severely outperforms a traditional KR based system. The results show that the bias introduced by a KR solution does not prevent it from doing a better job at the end task. Moreover, our method is interpretable due to the bias introduced by the KR approach.