Hai Wang


2020

pdf bib
On-The-Fly Information Retrieval Augmentation for Language Models
Hai Wang | David McAllester
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

Here we experiment with the use of information retrieval as an augmentation for pre-trained language models. The text corpus used in information retrieval can be viewed as form of episodic memory which grows over time. By augmenting GPT 2.0 with information retrieval we achieve a zero shot 15 % relative reduction in perplexity on Gigaword corpus without any re-training. We also validate our IR augmentation on an event co-reference task.

2019

pdf bib
Improving Pre-Trained Multilingual Model with Vocabulary Expansion
Hai Wang | Dian Yu | Kai Sun | Jianshu Chen | Dong Yu
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recently, pre-trained language models have achieved remarkable success in a broad range of natural language processing tasks. However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language. Instead of exhaustively pre-training monolingual language models independently, an alternative solution is to pre-train a powerful multilingual deep language model over large-scale corpora in hundreds of languages. However, the vocabulary size for each language in such a model is relatively small, especially for low-resource languages. This limitation inevitably hinders the performance of these multilingual models on tasks such as sequence labeling, wherein in-depth token-level or sentence-level understanding is essential. In this paper, inspired by previous methods designed for monolingual settings, we investigate two approaches (i.e., joint mapping and mixture mapping) based on a pre-trained multilingual model BERT for addressing the out-of-vocabulary (OOV) problem on a variety of tasks, including part-of-speech tagging, named entity recognition, machine translation quality estimation, and machine reading comprehension. Experimental results show that using mixture mapping is more promising. To the best of our knowledge, this is the first work that attempts to address and discuss the OOV issue in multilingual settings.

2017

pdf bib
Emergent Predication Structure in Hidden State Vectors of Neural Readers
Hai Wang | Takeshi Onishi | Kevin Gimpel | David McAllester
Proceedings of the 2nd Workshop on Representation Learning for NLP

A significant number of neural architectures for reading comprehension have recently been developed and evaluated on large cloze-style datasets. We present experiments supporting the emergence of predication structure in the hidden state vectors of these readers. More specifically, we provide evidence that the hidden state vectors represent atomic formulas [ c ] where is a semantic property (predicate) and c is a constant symbol entity identifier.\\Phi[c] where \\Phi is a semantic property (predicate) and c is a constant symbol entity identifier.

pdf bib
Broad Context Language Modeling as Reading Comprehension
Zewei Chu | Hai Wang | Kevin Gimpel | David McAllester
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Progress in text understanding has been driven by large datasets that test particular capabilities, like recent datasets for reading comprehension (Hermann et al., 2015). We focus here on the LAMBADA dataset (Paperno et al., 2016), a word prediction task requiring broader context than the immediate sentence. We view LAMBADA as a reading comprehension problem and apply comprehension models based on neural networks. Though these models are constrained to choose a word from the context, they improve the state of the art on LAMBADA from 7.3 % to 49 %. We analyze 100 instances, finding that neural network readers perform well in cases that involve selecting a name from the context based on dialogue or discourse cues but struggle when coreference resolution or external knowledge is needed.