Yoav Goldberg


2021

pdf bib
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
Shauli Ravfogel | Grusha Prasad | Tal Linzen | Yoav Goldberg
Proceedings of the 25th Conference on Computational Natural Language Learning

When language models process syntactically complex sentences, do they use their representations of syntax in a manner that is consistent with the grammar of the language? We propose AlterRep, an intervention-based method to address this question. For any linguistic feature of a given sentence, AlterRep generates counterfactual representations by altering how the feature is encoded, while leaving in- tact all other aspects of the original representation. By measuring the change in a model’s word prediction behavior when these counterfactual representations are substituted for the original ones, we can draw conclusions about the causal effect of the linguistic feature in question on the model’s behavior. We apply this method to study how BERT models of different sizes process relative clauses (RCs). We find that BERT variants use RC boundary information during word prediction in a manner that is consistent with the rules of English grammar ; this RC boundary information generalizes to a considerable extent across different RC types, suggesting that BERT represents RCs as an abstract linguistic category.

pdf bib
Bootstrapping Relation Extractors using Syntactic Search by Examples
Matan Eyal | Asaf Amrami | Hillel Taub-Tabib | Yoav Goldberg
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.

pdf bib
Contrastive Explanations for Model Interpretability
Alon Jacovi | Swabha Swayamdipta | Shauli Ravfogel | Yanai Elazar | Yejin Choi | Yoav Goldberg
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Our contrastive explanations can additionally answer for which label, and against which alternative label, is a given input feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token / span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.

pdf bib
Effects of Parameter Norm Growth During Transformer Training : Inductive Bias from Gradient Descent
William Merrill | Vivek Ramanujan | Yoav Goldberg | Roy Schwartz | Noah A. Smith
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically a variant of gradient descent (GD). To better understand this bias, we study the tendency for transformer parameters to grow in magnitude (_ 2 norm) during training, and its implications for the emergent representations within self attention layers. Empirically, we document norm growth in the training of transformer language models, including T5 during its pretraining. As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such saturated networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an inductive bias implicit in GD of particular interest for NLP. We leverage the emergent discrete structure in a saturated transformer to analyze the role of different attention heads, finding that some focus locally on a small number of positions, while other heads compute global averages, allowing counting. We believe understanding the interplay between these two capabilities may shed further light on the structure of computation within large transformers.\\ell_2 norm) during training, and its implications for the emergent representations within self attention layers. Empirically, we document norm growth in the training of transformer language models, including T5 during its pretraining. As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such “saturated” networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an inductive bias implicit in GD of particular interest for NLP. We leverage the emergent discrete structure in a saturated transformer to analyze the role of different attention heads, finding that some focus locally on a small number of positions, while other heads compute global averages, allowing counting. We believe understanding the interplay between these two capabilities may shed further light on the structure of computation within large transformers.

pdf bib
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?BERT Pretrained on Clinical Notes Reveal Sensitive Data?
Eric Lehman | Sarthak Jain | Karl Pichotta | Yoav Goldberg | Byron Wallace
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks. The cost of training such models (and the necessity of data access to do so) coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT. While most efforts have used deidentified EHR, many researchers have access to large sets of sensitive, non-deidentified EHR with which they might train a BERT model (or similar). Would it be safe to release the weights of such a model if they did? In this work, we design a battery of approaches intended to recover Personal Health Information (PHI) from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR. However, more sophisticated attacks may succeed in doing so : To facilitate such research, we make our experimental setup and baseline probing models available at https://github.com/elehman16/exposing_patient_data_release.

pdf bib
Ab Antiquo : Neural Proto-language Reconstruction
Carlo Meloni | Shauli Ravfogel | Yoav Goldberg
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this process be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language. We provide a novel dataset for this task, encompassing over 8,000 comparative entries, and show that neural sequence models outperform conventional methods applied to this task so far. Error analysis reveals a variability in the ability of neural model to capture different phonological changes, correlating with the complexity of the changes. Analysis of learned embeddings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics.

pdf bib
Proceedings of the 1st Workshop on Understanding Implicit and Underspecified Language
Michael Roth | Reut Tsarfaty | Yoav Goldberg
Proceedings of the 1st Workshop on Understanding Implicit and Underspecified Language

2020

pdf bib
Interactive Extractive Search over Biomedical Corpora
Hillel Taub Tabib | Micah Shlain | Shoval Sadde | Dan Lahav | Matan Eyal | Yaara Cohen | Yoav Goldberg
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of user queries. We demonstrate the system using example workflows over two corpora : the PubMed corpus including 14,446,243 PubMed abstracts and the CORD-19 dataset, a collection of over 45,000 research papers focused on COVID-19 research. The system is publicly available at https://allenai.github.io/spike

pdf bib
A Formal Hierarchy of RNN ArchitecturesRNN Architectures
William Merrill | Gail Weiss | Yoav Goldberg | Roy Schwartz | Noah A. Smith | Eran Yahav
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We develop a formal hierarchy of the expressive capacity of RNN architectures. The hierarchy is based on two formal properties : space complexity, which measures the RNN’s memory, and rational recurrence, defined as whether the recurrent update can be described by a weighted finite-state machine. We place several RNN variants within this hierarchy. For example, we prove the LSTM is not rational, which formally separates it from the related QRNN (Bradbury et al., 2016). We also show how these models’ expressive capacity is expanded by stacking multiple layers or composing them with different pooling functions. Our results build on the theory of saturated RNNs (Merrill, 2019). While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy. We provide empirical results to support this conjecture. Experimental findings from training unsaturated networks on formal languages support this conjecture.

pdf bib
Null It Out : Guarding Protected Attributes by Iterative Nullspace Projection
Shauli Ravfogel | Yanai Elazar | Hila Gonen | Michael Twiton | Yoav Goldberg
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The ability to control for the kinds of information encoded in neural representation has a variety of use cases, especially in light of the challenge of interpreting these models. We present Iterative Null-space Projection (INLP), a novel method for removing information from neural representations. Our method is based on repeated training of linear classifiers that predict a certain property we aim to remove, followed by projection of the representations on their null-space. By doing so, the classifiers become oblivious to that target property, making it hard to linearly separate the data according to it. While applicable for multiple uses, we evaluate our method on bias and fairness use-cases, and show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.

2019

pdf bib
Language Modeling for Code-Switching : Evaluation, Integration of Monolingual Data, and Discriminative Training
Hila Gonen | Yoav Goldberg
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We focus on the problem of language modeling for code-switched language, in the context of automatic speech recognition (ASR). Language modeling for code-switched language is challenging for (at least) three reasons : (1) lack of available large-scale code-switched data for training ; (2) lack of a replicable evaluation setup that is ASR directed yet isolates language modeling performance from the other intricacies of the ASR system ; and (3) the reliance on generative modeling. We tackle these three issues : we propose an ASR-motivated evaluation setup which is decoupled from an ASR system and the choice of vocabulary, and provide an evaluation dataset for English-Spanish code-switching. This setup lends itself to a discriminative training approach, which we demonstrate to work better than generative language modeling. Finally, we explore a variety of training protocols and verify the effectiveness of training with large amounts of monolingual data followed by fine-tuning with small amounts of code-switched data, for both the generative and discriminative cases.

pdf bib
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
Anna Rogers | Aleksandr Drozd | Anna Rumshisky | Yoav Goldberg
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

bib
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Hila Gonen | Yoav Goldberg
Proceedings of the 2019 Workshop on Widening NLP

Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recent works tackle this problem, and propose methods for significantly reducing this gender bias in word embeddings, demonstrating convincing results. However, we argue that this removal is superficial. While the bias is indeed substantially reduced according to the provided bias definition, the actual effect is mostly hiding the bias, not removing it. The gender bias information is still reflected in the distances between “gender-neutralized” words in the debiased embeddings, and can be recovered from them. We present a series of experiments to support this claim, for two debiasing methods. We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling.

bib
How does Grammatical Gender Affect Noun Representations in Gender-Marking Languages?
Hila Gonen | Yova Kementchedjhieva | Yoav Goldberg
Proceedings of the 2019 Workshop on Widening NLP

Many natural languages assign grammatical gender also to inanimate nouns in the language. In such languages, words that relate to the gender-marked nouns are inflected to agree with the noun’s gender. We show that this affects the word representations of inanimate nouns, resulting in nouns with the same gender being closer to each other than nouns with different gender. While “embedding debiasing” methods fail to remove the effect, we demonstrate that a careful application of methods that neutralize grammatical gender signals from the words’ context when training word embeddings is effective in removing it. Fixing the grammatical gender bias results in a positive effect on the quality of the resulting word embeddings, both in monolingual and cross lingual settings. We note that successfully removing gender signals, while achievable, is not trivial to do and that a language-specific morphological analyzer, together with careful usage of it, are essential for achieving good results.

pdf bib
Filling Gender & Number Gaps in Neural Machine Translation with Black-box Context Injection
Amit Moryossef | Roee Aharoni | Yoav Goldberg
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

When translating from a language that does not morphologically mark information such as gender and number into a language that does, translation systems must guess this missing information, often leading to incorrect translations in the given context. We propose a black-box approach for injecting the missing information to a pre-trained neural machine translation system, allowing to control the morphological variations in the generated translations without changing the underlying model or training data. We evaluate our method on an English to Hebrew translation task, and show that it is effective in injecting the gender and number information and that supplying the correct information improves the translation accuracy in up to 2.3 BLEU on a female-speaker test set for a state-of-the-art online black-box system. Finally, we perform a fine-grained syntactic analysis of the generated translations that shows the effectiveness of our method.

pdf bib
Improving Quality and Efficiency in Plan-based Neural Data-to-text Generation
Amit Moryossef | Yoav Goldberg | Ido Dagan
Proceedings of the 12th International Conference on Natural Language Generation

We follow the step-by-step approach to neural data-to-text generation proposed by Moryossef et al (2019), in which the generation process is divided into a text planning stage followed by a plan realization stage. We suggest four extensions to that framework : (1) we introduce a trainable neural planning component that can generate effective plans several orders of magnitude faster than the original planner ; (2) we incorporate typing hints that improve the model’s ability to deal with unseen relations and entities ; (3) we introduce a verification-by-reranking stage that substantially improves the faithfulness of the resulting texts ; (4) we incorporate a simple but effective referring expression generation module. These extensions result in a generation process that is faster, more fluent, and more accurate.

pdf bib
Lipstick on a Pig : Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove ThemDebiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Hila Gonen | Yoav Goldberg
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society. This phenomenon is pervasive and consistent across different word embedding models, causing serious concern. Several recent works tackle this problem, and propose methods for significantly reducing this gender bias in word embeddings, demonstrating convincing results. However, we argue that this removal is superficial. While the bias is indeed substantially reduced according to the provided bias definition, the actual effect is mostly hiding the bias, not removing it. The gender bias information is still reflected in the distances between gender-neutralized words in the debiased embeddings, and can be recovered from them. We present a series of experiments to support this claim, for two debiasing methods. We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling.

pdf bib
How Does Grammatical Gender Affect Noun Representations in Gender-Marking Languages?
Hila Gonen | Yova Kementchedjhieva | Yoav Goldberg
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Many natural languages assign grammatical gender also to inanimate nouns in the language. In such languages, words that relate to the gender-marked nouns are inflected to agree with the noun’s gender. We show that this affects the word representations of inanimate nouns, resulting in nouns with the same gender being closer to each other than nouns with different gender. While embedding debiasing methods fail to remove the effect, we demonstrate that a careful application of methods that neutralize grammatical gender signals from the words’ context when training word embeddings is effective in removing it. Fixing the grammatical gender bias yields a positive effect on the quality of the resulting word embeddings, both in monolingual and cross-lingual settings. We note that successfully removing gender signals, while achievable, is not trivial to do and that a language-specific morphological analyzer, together with careful usage of it, are essential for achieving good results.

2018

pdf bib
SetExpander : End-to-end Term Set Expansion Based on Multi-Context Term EmbeddingsSetExpander: End-to-end Term Set Expansion Based on Multi-Context Term Embeddings
Jonathan Mamou | Oren Pereg | Moshe Wasserblat | Ido Dagan | Yoav Goldberg | Alon Eirew | Yael Green | Shira Guskin | Peter Izsak | Daniel Korat
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We present SetExpander, a corpus-based system for expanding a seed set of terms into a more complete set of terms that belong to the same semantic class. SetExpander implements an iterative end-to end workflow for term set expansion. It enables users to easily select a seed set of terms, expand it, view the expanded set, validate it, re-expand the validated set and store it, thus simplifying the extraction of domain-specific fine-grained semantic classes. SetExpander has been used for solving real-life use cases including integration in an automated recruitment system and an issues and defects resolution system. A video demo of SetExpander is available at https://drive.google.com/open?id=1e545bB87Autsch36DjnJHmq3HWfSd1Rv.

pdf bib
Adversarial Removal of Demographic Attributes from Text Data
Yanai Elazar | Yoav Goldberg
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent advances in Representation Learning and Adversarial Training seem to succeed in removing unwanted features from the learned representation. We show that demographic information of authors is encoded inand can be recovered fromthe intermediate representations learned by text-based neural classifiers. The implication is that decisions of classifiers trained on textual data are not agnostic toand likely condition ondemographic attributes. When attempting to remove such demographic information using adversarial training, we find that while the adversarial component achieves chance-level development-set accuracy during training, a post-hoc classifier, trained on the encoded sentences from the first part, still manages to reach substantially higher classification accuracies on the same data. This behavior is consistent across several tasks, demographic properties and datasets. We explore several techniques to improve the effectiveness of the adversarial component. Our main conclusion is a cautionary one : do not rely on the adversarial training to achieve invariant representation to sensitive features.

pdf bib
Word Sense Induction with Neural biLM and Symmetric PatternsLM and Symmetric Patterns
Asaf Amrami | Yoav Goldberg
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

An established method for Word Sense Induction (WSI) uses a language model to predict probable substitutes for target words, and induces senses by clustering these resulting substitute vectors. We replace the ngram-based language model (LM) with a recurrent one. Beyond being more accurate, the use of the recurrent LM allows us to effectively query it in a creative way, using what we call dynamic symmetric patterns. The combination of the RNN-LM and the dynamic symmetric patterns results in strong substitute vectors for WSI, allowing to surpass the current state-of-the-art on the SemEval 2013 WSI shared task by a large margin.

pdf bib
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP
Georgiana Dinu | Miguel Ballesteros | Avirup Sil | Sam Bowman | Wael Hamza | Anders Sogaard | Tahira Naseem | Yoav Goldberg
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

pdf bib
Breaking NLI Systems with Sentences that Require Simple Lexical InferencesNLI Systems with Sentences that Require Simple Lexical Inferences
Max Glockner | Vered Shwartz | Yoav Goldberg
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We create a new NLI test set that shows the deficiency of state-of-the-art models in inferences that require lexical and world knowledge. The new examples are simpler than the SNLI test set, containing sentences that differ by at most one word from sentences in the training set. Yet, the performance on the new test set is substantially worse across systems trained on SNLI, demonstrating that these systems are limited in their generalization ability, failing to capture many simple inferences.

pdf bib
Split and Rephrase : Better Evaluation and Stronger Baselines
Roee Aharoni | Yoav Goldberg
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Splitting and rephrasing a complex sentence into several shorter sentences that convey the same meaning is a challenging problem in NLP. We show that while vanilla seq2seq models can reach high scores on the proposed benchmark (Narayan et al., 2017), they suffer from memorization of the training set which contains more than 89 % of the unique simple sentences from the validation and test sets. To aid this, we present a new train-development-test data split and neural models augmented with a copy-mechanism, outperforming the best reported baseline by 8.68 BLEU and fostering further progress on the task.

pdf bib
On the Practical Computational Power of Finite Precision RNNs for Language RecognitionRNNs for Language Recognition
Gail Weiss | Yoav Goldberg | Eran Yahav
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.

2017

pdf bib
Exploring the Syntactic Abilities of RNNs with Multi-task LearningRNNs with Multi-task Learning
Émile Enguehard | Yoav Goldberg | Tal Linzen
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Recent work has explored the syntactic abilities of RNNs using the subject-verb agreement task, which diagnoses sensitivity to sentence structure. RNNs performed this task well in common cases, but faltered in complex sentences (Linzen et al., 2016). We test whether these errors are due to inherent limitations of the architecture or to the relatively indirect supervision provided by most agreement dependencies in a corpus. We trained a single RNN to perform both the agreement task and an additional task, either CCG supertagging or language modeling. Multi-task training led to significantly lower error rates, in particular on complex sentences, suggesting that RNNs have the ability to evolve more sophisticated syntactic representations than shown before. We also show that easily available agreement training data can improve performance on other syntactic tasks, in particular when only a limited amount of training data is available for those tasks. The multi-task paradigm can also be leveraged to inject grammatical knowledge into language models.

pdf bib
Morphological Inflection Generation with Hard Monotonic Attention
Roee Aharoni | Yoav Goldberg
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a neural model for morphological inflection generation which employs a hard attention mechanism, inspired by the nearly-monotonic alignment commonly found between the characters in a word and the characters in its inflection. We evaluate the model on three previously studied morphological inflection generation datasets and show that it provides state of the art results in various setups compared to previous neural and non-neural approaches. Finally we present an analysis of the continuous representations learned by both the hard and soft (Bahdanau, 2014) attention models for the task, shedding some light on the features such models extract.

pdf bib
Towards String-To-Tree Neural Machine Translation
Roee Aharoni | Yoav Goldberg
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We present a simple method to incorporate syntactic information about the target language in a neural machine translation system by translating into linearized, lexicalized constituency trees. An experiment on the WMT16 German-English news translation task resulted in an improved BLEU score when compared to a syntax-agnostic NMT baseline trained on the same dataset. An analysis of the translations from the syntax-aware system shows that it performs more reordering during translation in comparison to the baseline. A small-scale human evaluation also showed an advantage to the syntax-aware system.

pdf bib
Controlling Linguistic Style Aspects in Neural Language Generation
Jessica Ficler | Yoav Goldberg
Proceedings of the Workshop on Stylistic Variation

Most work on neural natural language generation (NNLG) focus on controlling the content of the generated text. We experiment with controling several stylistic aspects of the generated text, in addition to its content. The method is based on conditioned RNN language model, where the desired content as well as the stylistic parameters serve as conditioning contexts. We demonstrate the approach on the movie reviews domain and show that it is successful in generating coherent sentences corresponding to the required linguistic style and content.

pdf bib
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP
Samuel Bowman | Yoav Goldberg | Felix Hill | Angeliki Lazaridou | Omer Levy | Roi Reichart | Anders Søgaard
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP

pdf bib
Greedy Transition-Based Dependency Parsing with Stack LSTMsLSTMs
Miguel Ballesteros | Chris Dyer | Yoav Goldberg | Noah A. Smith
Computational Linguistics, Volume 43, Issue 2 - June 2017

We introduce a greedy transition-based parser that learns to represent parser states using recurrent neural networks. Our primary innovation that enables us to do this efficiently is a new control structure for sequential neural networksthe stack long short-term memory unit (LSTM). Like the conventional stack data structures used in transition-based parsers, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space embedding of the stack contents. Our model captures three facets of the parser’s state : (i) unbounded look-ahead into the buffer of incoming words, (ii) the complete history of transition actions taken by the parser, and (iii) the complete contents of the stack of partially built tree fragments, including their internal structures. In addition, we compare two different word representations : (i) standard word vectors based on look-up tables and (ii) character-based models of words. Although standard word embedding models work well in all languages, the character-based models improve the handling of out-of-vocabulary words, particularly in morphologically rich languages. Finally, we discuss the use of dynamic oracles in training the parser. During training, dynamic oracles alternate between sampling parser states from the training data and from the model as it is being learned, making the model more robust to the kinds of errors that will be made at test time. Training our model with dynamic oracles yields a linear-time greedy parser with very competitive performance.

pdf bib
A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
Omer Levy | Anders Søgaard | Yoav Goldberg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Model-1, which demonstrate similar performance to state-of-the-art embedding algorithms on a variety of benchmarks. Overall, we observe that different algorithmic approaches for utilizing the sentence ID feature space result in similar performance. This paper draws both empirical and theoretical parallels between the embedding and alignment literature, and suggests that adding additional sources of information, which go beyond the traditional signal of bilingual sentence-aligned corpora, may substantially improve cross-lingual word embeddings, and that future baselines should at least take such features into account.

pdf bib
Improving a Strong Neural Parser with Conjunction-Specific Features
Jessica Ficler | Yoav Goldberg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

While dependency parsers reach very high overall accuracy, some dependency relations are much harder than others. In particular, dependency parsers perform poorly in coordination construction (i.e., correctly attaching the conj relation). We extend a state-of-the-art dependency parser with conjunction-specific features, focusing on the similarity between the conjuncts head words. Training the extended parser yields an improvement in conj attachment as well as in overall dependency parsing accuracy on the Stanford dependency conversion of the Penn TreeBank.

pdf bib
The Interplay of Semantics and Morphology in Word Embeddings
Oded Avraham | Yoav Goldberg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We explore the ability of word embeddings to capture both semantic and morphological similarity, as affected by the different types of linguistic properties (surface form, lemma, morphological tag) used to compose the representation of each word. We train several models, where each uses a different subset of these properties to compose its representations. By evaluating the models on semantic and morphological measures, we reveal some useful insights on the relationship between semantics and morphology.