Vlad Niculae


pdf bib
Understanding the Mechanics of SPIGOT : Surrogate Gradients for Latent Structure LearningSPIGOT: Surrogate Gradients for Latent Structure Learning
Tsvetomila Mihaylova | Vlad Niculae | André F. T. Martins
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Latent structure models are a powerful tool for modeling language data : they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.


pdf bib
Proceedings of the Third Workshop on Structured Prediction for NLP
Andre Martins | Andreas Vlachos | Zornitsa Kozareva | Sujith Ravi | Gerasimos Lampouras | Vlad Niculae | Julia Kreutzer
Proceedings of the Third Workshop on Structured Prediction for NLP


pdf bib
Towards Dynamic Computation Graphs via Sparse Latent Structure
Vlad Niculae | André F. T. Martins | Claire Cardie
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep NLP models benefit from underlying structures in the datae.g., parse treestypically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff : either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. To the best of our knowledge, our method is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability.

pdf bib
Interpretable Structure Induction via Sparse Attention
Ben Peters | Vlad Niculae | André F. T. Martins
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Neural network methods are experiencing wide adoption in NLP, thanks to their empirical performance on many tasks. Modern neural architectures go way beyond simple feedforward and recurrent models : they are complex pipelines that perform soft, differentiable computation instead of discrete logic. The price of such soft computing is the introduction of dense dependencies, which make it hard to disentangle the patterns that trigger a prediction. Our recent work on sparse and structured latent computation presents a promising avenue for enhancing interpretability of such neural pipelines. Through this extended abstract, we aim to discuss and explore the potential and impact of our methods.


pdf bib
Argument Mining with Structured SVMs and RNNsSVMs and RNNs
Vlad Niculae | Joonsuk Park | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20 % of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.