Lawrence Carin


2021

pdf bib
APo-VAE : Text Generation in Hyperbolic SpaceAPo-VAE: Text Generation in Hyperbolic Space
Shuyang Dai | Zhe Gan | Yu Cheng | Chenyang Tao | Lawrence Carin | Jingjing Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Natural language often exhibits inherent hierarchical structure ingrained with complex syntax and semantics. However, most state-of-the-art deep generative models learn embeddings only in Euclidean vector space, without accounting for this structural property of language. In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations. An Adversarial Poincare Variational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions. By adopting the primal-dual formulation of Kullback-Leibler divergence, an adversarial learning procedure is introduced to empower robust model training. Extensive experiments in language modeling, unaligned style transfer, and dialog-response generation demonstrate the effectiveness of the proposed APo-VAE model over VAEs in Euclidean latent space, thanks to its superb capabilities in capturing latent language hierarchies in hyperbolic space.

pdf bib
SpanPredict : Extraction of Predictive Document Spans with Neural AttentionSpanPredict: Extraction of Predictive Document Spans with Neural Attention
Vivek Subramanian | Matthew Engelhard | Sam Berchuck | Liqun Chen | Ricardo Henao | Lawrence Carin
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In many natural language processing applications, identifying predictive text can be as important as the predictions themselves. When predicting medical diagnoses, for example, identifying predictive content in clinical notes not only enhances interpretability, but also allows unknown, descriptive (i.e., text-based) risk factors to be identified. We here formalize this problem as predictive extraction and address it using a simple mechanism based on linear attention. Our method preserves differentiability, allowing scalable inference via stochastic gradient descent. Further, the model decomposes predictions into a sum of contributions of distinct text spans. Importantly, we require only document labels, not ground-truth spans. Results show that our model identifies semantically-cohesive spans and assigns them scores that agree with human ratings, while preserving classification performance.

2020

pdf bib
Methods for Numeracy-Preserving Word Embeddings
Dhanasekar Sundararaman | Shijing Si | Vivek Subramanian | Guoyin Wang | Devamanyu Hazarika | Lawrence Carin
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Word embedding models are typically able to capture the semantics of words via the distributional hypothesis, but fail to capture the numerical properties of numbers that appear in the text. This leads to problems with numerical reasoning involving tasks such as question answering. We propose a new methodology to assign and learn embeddings for numbers. Our approach creates Deterministic, Independent-of-Corpus Embeddings (the model is referred to as DICE) for numbers, such that their cosine similarity reflects the actual distance on the number line. DICE outperforms a wide range of pre-trained word embedding models across multiple examples of two tasks : (i) evaluating the ability to capture numeration and magnitude ; and (ii) to perform list maximum, decoding, and addition. We further explore the utility of these embeddings in downstream tasks, by initializing numbers with our approach for the task of magnitude prediction. We also introduce a regularization approach to learn model-based embeddings of numbers in a contextual setting.

pdf bib
Improving Disentangled Text Representation Learning with Information-Theoretic Guidance
Pengyu Cheng | Martin Renqiang Min | Dinghan Shen | Christopher Malon | Yizhe Zhang | Yitong Li | Lawrence Carin
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Learning disentangled representations of natural language is essential for many NLP tasks, e.g., conditional text generation, style transfer, personalized dialogue systems, etc. Similar problems have been studied extensively for other forms of data, such as images and videos. However, the discrete nature of natural language makes the disentangling of textual representations more challenging (e.g., the manipulation over the data space can not be easily achieved). Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text, without any supervision on semantics. A new mutual information upper bound is derived and leveraged to measure dependence between style and content. By minimizing this upper bound, the proposed method induces style and content embeddings into two independent low-dimensional spaces. Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation in terms of content and style preservation.

pdf bib
Semantic Matching for Sequence-to-Sequence Learning
Ruiyi Zhang | Changyou Chen | Xinyuan Zhang | Ke Bai | Lawrence Carin
Findings of the Association for Computational Linguistics: EMNLP 2020

In sequence-to-sequence models, classical optimal transport (OT) can be applied to semantically match generated sentences with target sentences. However, in non-parallel settings, target sentences are usually unavailable. To tackle this issue without losing the benefits of classical OT, we present a semantic matching scheme based on the Optimal Partial Transport (OPT). Specifically, our approach partially matches semantically meaningful words between source and partial target sequences. To overcome the difficulty of detecting active regions in OPT (corresponding to the words needed to be matched), we further exploit prior knowledge to perform partial matching. Extensive experiments are conducted to evaluate the proposed approach, showing consistent improvements over sequence-to-sequence tasks.

2019

pdf bib
An End-to-End Generative Architecture for Paraphrase Generation
Qian Yang | Zhouyuan Huo | Dinghan Shen | Yong Cheng | Wenlin Wang | Guoyin Wang | Lawrence Carin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating high-quality paraphrases is a fundamental yet challenging natural language processing task. Despite the effectiveness of previous work based on generative models, there remain problems with exposure bias in recurrent neural networks, and often a failure to generate realistic sentences. To overcome these challenges, we propose the first end-to-end conditional generative architecture for generating paraphrases via adversarial training, which does not depend on extra linguistic information. Extensive experiments on four public datasets demonstrate the proposed method achieves state-of-the-art results, outperforming previous generative architectures on both automatic metrics (BLEU, METEOR, and TER) and human evaluations.

pdf bib
Topic-Guided Variational Auto-Encoder for Text Generation
Wenlin Wang | Zhe Gan | Hongteng Xu | Ruiyi Zhang | Guoyin Wang | Dinghan Shen | Changyou Chen | Lawrence Carin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a topic-guided variational auto-encoder (TGVAE) model for text generation. Distinct from existing variational auto-encoder (VAE) based approaches, which assume a simple Gaussian prior for latent code, our model specifies the prior as a Gaussian mixture model (GMM) parametrized by a neural topic module. Each mixture component corresponds to a latent topic, which provides a guidance to generate sentences under the topic. The neural topic module and the VAE-based neural sequence module in our model are learned jointly. In particular, a sequence of invertible Householder transformations is applied to endow the approximate posterior of the latent code with high flexibility during the model inference. Experimental results show that our TGVAE outperforms its competitors on both unconditional and conditional text generation, which can also generate semantically-meaningful sentences with various topics.

pdf bib
Cyclical Annealing Schedule : A Simple Approach to Mitigating KL VanishingKL Vanishing
Hao Fu | Chunyuan Li | Xiaodong Liu | Jianfeng Gao | Asli Celikyilmaz | Lawrence Carin
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Variational autoencoders (VAE) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. VAE objective consists of two terms, the KL regularization term and the reconstruction term, balanced by a weighting hyper-parameter. One notorious training difficulty is that the KL term tends to vanish. In this paper we study different scheduling schemes for, and show that KL vanishing is caused by the lack of good latent codes in training decoder at the beginning of optimization. To remedy the issue, we propose a cyclical annealing schedule, which simply repeats the process of increasing multiple times. This new procedure allows us to learn more meaningful latent codes progressively by leveraging the results of previous learning cycles as warm re-restart. The effectiveness of cyclical annealing schedule is validated on a broad range of NLP tasks, including language modeling, dialog response generation and semi-supervised text classification.\\beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study different scheduling schemes for \\beta, and show that KL vanishing is caused by the lack of good latent codes in training decoder at the beginning of optimization. To remedy the issue, we propose a cyclical annealing schedule, which simply repeats the process of increasing \\beta multiple times. This new procedure allows us to learn more meaningful latent codes progressively by leveraging the results of previous learning cycles as warm re-restart. The effectiveness of cyclical annealing schedule is validated on a broad range of NLP tasks, including language modeling, dialog response generation and semi-supervised text classification.

pdf bib
Learning Compressed Sentence Representations for On-Device Text Processing
Dinghan Shen | Pengyu Cheng | Dhanasekar Sundararaman | Xinyuan Zhang | Qian Yang | Meng Tang | Asli Celikyilmaz | Lawrence Carin
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2 % relative to their continuous counterparts, while reducing the storage requirement by over 98 %. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.

pdf bib
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models
Dinghan Shen | Asli Celikyilmaz | Yizhe Zhang | Liqun Chen | Xin Wang | Jianfeng Gao | Lawrence Carin
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables. However, previous works typically focus on synthesizing relatively short sentences (up to 20 words), and the posterior collapse issue has been widely identified in text-VAEs. In this paper, we propose to leverage several multi-level structures to learn a VAE model for generating long, and coherent text. In particular, a hierarchy of stochastic layers between the encoder and decoder networks is employed to abstract more informative and semantic-rich latent codes. Besides, we utilize a multi-level decoder structure to capture the coherent long-term structure inherent in long-form texts, by generating intermediate sentence representations as high-level plan vectors. Extensive experimental results demonstrate that the proposed multi-level VAE model produces more coherent and less repetitive long text compared to baselines as well as can mitigate the posterior-collapse issue.

2018

pdf bib
Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment
Dinghan Shen | Xinyuan Zhang | Ricardo Henao | Lawrence Carin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Network embeddings, which learns low-dimensional representations for each vertex in a large-scale network, have received considerable attention in recent years. For a wide range of applications, vertices in a network are typically accompanied by rich textual information such as user profiles, paper abstracts, etc. In this paper, we propose to incorporate semantic features into network embeddings by matching important words between text sequences for all pairs of vertices. We introduce an word-by-word alignment framework that measures the compatibility of embeddings between word pairs, and then adaptively accumulates these alignment features with a simple yet effective aggregation function. In experiments, we evaluate the proposed framework on three real-world benchmarks for downstream tasks, including link prediction and multi-label vertex classification. The experimental results demonstrate that our model outperforms state-of-the-art network embedding methods by a large margin.

pdf bib
Learning Context-Sensitive Convolutional Filters for Text Processing
Dinghan Shen | Martin Renqiang Min | Yitong Li | Lawrence Carin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Convolutional neural networks (CNNs) have recently emerged as a popular building block for natural language processing (NLP). Despite their success, most existing CNN models employed in NLP share the same learned (and static) set of filters for all input sentences. In this paper, we consider an approach of using a small meta network to learn context-sensitive convolutional filters for text processing. The role of meta network is to abstract the contextual information of a sentence or document into a set of input-sensitive filters. We further generalize this framework to model sentence pairs, where a bidirectional filter generation mechanism is introduced to encapsulate co-dependent sentence representations. In our benchmarks on four different tasks, including ontology classification, sentiment analysis, answer sentence selection, and paraphrase identification, our proposed model, a modified CNN with context-sensitive filters, consistently outperforms the standard CNN and attention-based CNN baselines. By visualizing the learned context-sensitive filters, we further validate and rationalize the effectiveness of proposed framework.

pdf bib
Baseline Needs More Love : On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
Dinghan Shen | Guoyin Wang | Wenlin Wang | Martin Renqiang Min | Qinliang Su | Yizhe Zhang | Chunyuan Li | Ricardo Henao | Lawrence Carin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN / CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings : (i) a max-pooling operation for improved interpretability ; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks : (i) (long) document classification ; (ii) text sequence matching ; and (iii) short text tasks, including classification and tagging.compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

pdf bib
Joint Embedding of Words and Labels for Text Classification
Guoyin Wang | Chunyuan Li | Wenlin Wang | Yizhe Zhang | Dinghan Shen | Xinyuan Zhang | Ricardo Henao | Lawrence Carin
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a label-word joint embedding problem : each label is embedded in the same space with the word vectors. We introduce an attention framework that measures the compatibility of embeddings between text sequences and labels. The attention is learned on a training set of labeled samples to ensure that, given a text sequence, the relevant words are weighted higher than the irrelevant ones. Our method maintains the interpretability of word embeddings, and enjoys a built-in ability to leverage alternative sources of information, in addition to input text sequences. Extensive results on the several large text datasets show that the proposed framework outperforms the state-of-the-art methods by a large margin, in terms of both accuracy and speed.

2017

pdf bib
Learning Generic Sentence Representations Using Convolutional Neural Networks
Zhe Gan | Yunchen Pu | Ricardo Henao | Chunyuan Li | Xiaodong He | Lawrence Carin
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a continuous vector, and using a long short-term memory recurrent neural network as a decoder. Several tasks are considered, including sentence reconstruction and future sentence prediction. Further, a hierarchical encoder-decoder model is proposed to encode a sentence to predict multiple future sentences. By training our models on a large collection of novels, we obtain a highly generic convolutional sentence encoder that performs well in practice. Experimental results on several benchmark datasets, and across a broad range of applications, demonstrate the superiority of the proposed model over competing methods.