Qi Zhang


2022

pdf bib
Robust Lottery Tickets for Pre-trained Language Models
Rui Zheng | Bao Rong | Yuhao Zhou | Di Liang | Sirui Wang | Wei Wu | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent works on Lottery Ticket Hypothesis have shown that pre-trained language models (PLMs) contain smaller matching subnetworks(winning tickets) which are capable of reaching accuracy comparable to the original models. However, these tickets are proved to be notrobust to adversarial examples, and even worse than their PLM counterparts. To address this problem, we propose a novel method based on learning binary weight masks to identify robust tickets hidden in the original PLMs. Since the loss is not differentiable for the binary mask, we assign the hard concrete distribution to the masks and encourage their sparsity using a smoothing approximation of L0 regularization.Furthermore, we design an adversarial loss objective to guide the search for robust tickets and ensure that the tickets perform well bothin accuracy and robustness. Experimental results show the significant improvement of the proposed method over previous work on adversarial robustness evaluation.

2021

pdf bib
Math Word Problem Solving with Explicit Numerical Values
Qinzhuo Wu | Qi Zhang | Zhongyu Wei | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In recent years, math word problem solving has received considerable attention and achieved promising results, but previous methods rarely take numerical values into consideration. Most methods treat the numerical values in the problems as number symbols, and ignore the prominent role of the numerical values in solving the problem. In this paper, we propose a novel approach called NumS2 T, which enhances math word problem solving performance by explicitly incorporating numerical values into a sequence-to-tree network. In addition, a numerical properties prediction mechanism is used to capture the category and comparison information of numerals and measure their importance in global expressions. Experimental results on the Math23 K and APE datasets demonstrate that our model achieves better performance than existing state-of-the-art models.

pdf bib
TextFlint : Unified Multilingual Robustness Evaluation Toolkit for Natural Language ProcessingTextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
Xiao Wang | Qin Liu | Tao Gui | Qi Zhang | Yicheng Zou | Xin Zhou | Jiacheng Ye | Yongxin Zhang | Rui Zheng | Zexiong Pang | Qinzhuo Wu | Zhengyan Li | Chong Zhang | Ruotian Ma | Zichu Fei | Ruijian Cai | Jun Zhao | Xingwu Hu | Zhiheng Yan | Yiding Tan | Yuan Hu | Qiyuan Bian | Zhihua Liu | Shan Qin | Bolin Zhu | Xiaoyu Xing | Jinlan Fu | Yue Zhang | Minlong Peng | Xiaoqing Zheng | Yaqian Zhou | Zhongyu Wei | Xipeng Qiu | Xuanjing Huang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

TextFlint is a multilingual robustness evaluation toolkit for NLP tasks that incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analyses. This enables practitioners to automatically evaluate their models from various aspects or to customize their evaluations as desired with just a few lines of code. TextFlint also generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model in terms of its robustness. To guarantee acceptability, all the text transformations are linguistically based and all the transformed data selected (up to 100,000 texts) scored highly under human evaluation. To validate the utility, we performed large-scale empirical evaluations (over 67,000) on state-of-the-art deep learning models, classic supervised methods, and real-world systems. The toolkit is already available at https://github.com/textflint with all the evaluation results demonstrated at textflint.io.

pdf bib
A Partition Filter Network for Joint Entity and Relation Extraction
Zhiheng Yan | Chong Zhang | Jinlan Fu | Qi Zhang | Zhongyu Wei
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In joint entity and relation extraction, existing work either sequentially encode task-specific features, leading to an imbalance in inter-task feature interaction where features extracted later have no direct contact with those that come first. Or they encode entity features and relation features in a parallel manner, meaning that feature representation learning for each task is largely independent of each other except for input sharing. We propose a partition filter network to model two-way interaction between tasks properly, where feature encoding is decomposed into two steps : partition and filter. In our encoder, we leverage two gates : entity and relation gate, to segment neurons into two task partitions and one shared partition. The shared partition represents inter-task information valuable to both tasks and is evenly shared across two tasks to ensure proper two-way interaction. The task partitions represent intra-task information and are formed through concerted efforts of both gates, making sure that encoding of task-specific features is dependent upon each other. Experiment results on six public datasets show that our model performs significantly better than previous approaches. In addition, contrary to what previous work has claimed, our auxiliary experiments suggest that relation prediction is contributory to named entity prediction in a non-negligible way. The source code can be found at https://github.com/Coopercoppers/PFN.

pdf bib
Thinking Clearly, Talking Fast : Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems
Yicheng Zou | Zhihua Liu | Xingwu Hu | Qi Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Human dialogue contains evolving concepts, and speakers naturally associate multiple concepts to compose a response. However, current dialogue models with the seq2seq framework lack the ability to effectively manage concept transitions and can hardly introduce multiple concepts to responses in a sequential decoding manner. To facilitate a controllable and coherent dialogue, in this work, we devise a concept-guided non-autoregressive model (CG-nAR) for open-domain dialogue generation. The proposed model comprises a multi-concept planning module that learns to identify multiple associated concepts from a concept graph and a customized Insertion Transformer that performs concept-guided non-autoregressive generation to complete a response. The experimental results on two public datasets show that CG-nAR can produce diverse and coherent responses, outperforming state-of-the-art baselines in both automatic and human evaluations with substantially faster inference speed.

pdf bib
A Relation-Oriented Clustering Method for Open Relation Extraction
Jun Zhao | Tao Gui | Qi Zhang | Yaqian Zhou
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

The clustering-based unsupervised relation discovery method has gradually become one of the important methods of open relation extraction (OpenRE). However, high-dimensional vectors can encode complex linguistic information which leads to the problem that the derived clusters can not explicitly align with the relational semantic classes. In this work, we propose a relation-oriented clustering model and use it to identify the novel relations in the unlabeled data. Specifically, to enable the model to learn to cluster relational data, our method leverages the readily available labeled data of pre-defined relations to learn a relation-oriented representation. We minimize distance between the instance with same relation by gathering the instances towards their corresponding relation centroids to form a cluster structure, so that the learned representation is cluster-friendly. To reduce the clustering bias on predefined classes, we optimize the model by minimizing a joint objective on both labeled and unlabeled data. Experimental results show that our method reduces the error rate by 29.2 % and 15.7 %, on two datasets respectively, compared with current SOTA methods.

pdf bib
Larger-Context Tagging : When and Why Does It Work?
Jinlan Fu | Liangjing Feng | Qi Zhang | Xuanjing Huang | Pengfei Liu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The development of neural networks and pretraining techniques has spawned many sentence-level tagging systems that achieved superior performance on typical benchmarks. However, a relatively less discussed topic is what if more context information is introduced into current top-scoring tagging systems. Although several existing works have attempted to shift tagging systems from sentence-level to document-level, there is still no consensus conclusion about when and why it works, which limits the applicability of the larger-context approach in tagging tasks. In this paper, instead of pursuing a state-of-the-art tagging system by architectural exploration, we focus on investigating when and why the larger-context training, as a general strategy, can work. To this end, we conduct a thorough comparative study on four proposed aggregators for context information collecting and present an attribute-aided evaluation method to interpret the improvement brought by larger-context training. Experimentally, we set up a testbed based on four tagging tasks and thirteen datasets. Hopefully, our preliminary observations can deepen the understanding of larger-context training and enlighten more follow-up works on the use of contextual information.

2020

pdf bib
Simplify the Usage of Lexicon in Chinese NERChinese NER
Ruotian Ma | Minlong Peng | Qi Zhang | Zhongyu Wei | Xuanjing Huang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recently, many works have tried to augment the performance of Chinese named entity recognition (NER) using word lexicons. As a representative, Lattice-LSTM has achieved new benchmark results on several public Chinese NER datasets. However, Lattice-LSTM has a complex model architecture. This limits its application in many industrial areas where real-time NER responses are needed. In this work, we propose a simple but effective method for incorporating the word lexicon into the character representations. This method avoids designing a complicated sequence modeling architecture, and for any neural NER model, it requires only subtle adjustment of the character representation layer to introduce the lexicon information. Experimental studies on four benchmark Chinese NER datasets show that our method achieves an inference speed up to 6.15 times faster than those of state-of-the-art methods, along with a better performance. The experimental results also show that the proposed method can be easily incorporated with pre-trained models like BERT.

pdf bib
Toward Recognizing More Entity Types in NER : An Efficient Implementation using Only Entity LexiconsNER: An Efficient Implementation using Only Entity Lexicons
Minlong Peng | Ruotian Ma | Qi Zhang | Lujun Zhao | Mengxi Wei | Changlong Sun | Xuanjing Huang
Findings of the Association for Computational Linguistics: EMNLP 2020

In this work, we explore the way to quickly adjust an existing named entity recognition (NER) system to make it capable of recognizing entity types not defined in the system. As an illustrative example, consider the case that a NER system has been built to recognize person and organization names, and now it requires to additionally recognize job titles. Such a situation is common in the industrial areas, where the entity types required to recognize vary a lot in different products and keep changing. To avoid laborious data labeling and achieve fast adaptation, we propose to adjust the existing NER system using the previously labeled data and entity lexicons of the newly introduced entity types. We formulate such a task as a partially supervised learning problem and accordingly propose an effective algorithm to solve the problem. Comprehensive experimental studies on several public NER datasets validate the effectiveness of our method.

pdf bib
Keep it Consistent : Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication
Ruize Wang | Zhongyu Wei | Ying Cheng | Piji Li | Haijun Shan | Ji Zhang | Qi Zhang | Xuanjing Huang
Proceedings of the 28th International Conference on Computational Linguistics

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method’s good ability in generating stories with higher quality compared to state-of-the-art methods.

2019

pdf bib
A Lexicon-Based Graph Neural Network for Chinese NERChinese NER
Tao Gui | Yicheng Zou | Qi Zhang | Minlong Peng | Jinlan Fu | Zhongyu Wei | Xuanjing Huang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recurrent neural networks (RNN) used for Chinese named entity recognition (NER) that sequentially track character and word information have achieved great success. However, the characteristic of chain structure and the lack of global semantics determine that RNN-based models are vulnerable to word ambiguities. In this work, we try to alleviate this problem by introducing a lexicon-based graph neural network with global semantics, in which lexicon knowledge is used to connect characters to capture the local composition, while a global relay node can capture global sentence semantics and long-range dependency. Based on the multiple graph-based interactions among characters, potential words, and the whole-sentence semantics, word ambiguities can be effectively tackled. Experiments on four NER datasets show that the proposed model achieves significant improvements against other baseline models.

pdf bib
A Multi-Task Learning Framework for Extracting Bacteria Biotope Information
Qi Zhang | Chao Liu | Ying Chi | Xuansong Xie | Xiansheng Hua
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

This paper presents a novel transfer multi-task learning method for Bacteria Biotope rel+ner task at BioNLP-OST 2019. To alleviate the data deficiency problem in domain-specific information extraction, we use BERT(Bidirectional Encoder Representations from Transformers) and pre-train it using mask language models and next sentence prediction on both general corpus and medical corpus like PubMed. In fine-tuning stage, we fine-tune the relation extraction layer and mention recognition layer designed by us on the top of BERT to extract mentions and relations simultaneously. The evaluation results show that our method achieves the best performance on all metrics (including slot error rate, precision and recall) in the Bacteria Biotope rel+ner subtask.

pdf bib
Posterior-regularized REINFORCE for Instance Selection in Distant SupervisionREINFORCE for Instance Selection in Distant Supervision
Qi Zhang | Siliang Tang | Xiang Ren | Fei Wu | Shiliang Pu | Yueting Zhuang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

This paper provides a new way to improve the efficiency of the REINFORCE training process. We apply it to the task of instance selection in distant supervision. Modeling the instance selection in one bag as a sequential decision process, a reinforcement learning agent is trained to determine whether an instance is valuable or not and construct a new bag with less noisy instances. However unbiased methods, such as REINFORCE, could usually take much time to train. This paper adopts posterior regularization (PR) to integrate some domain-specific rules in instance selection using REINFORCE. As the experiment results show, this method remarkably improves the performance of the relation classifier trained on cleaned distant supervision dataset as well as the efficiency of the REINFORCE training.

2018

pdf bib
Transferring from Formal Newswire Domain with Hypernet for Twitter POS TaggingTwitter POS Tagging
Tao Gui | Qi Zhang | Jingjing Gong | Minlong Peng | Di Liang | Keyu Ding | Xuanjing Huang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Part-of-Speech (POS) tagging for Twitter has received considerable attention in recent years. Because most POS tagging methods are based on supervised models, they usually require a large amount of labeled data for training. However, the existing labeled datasets for Twitter are much smaller than those for newswire text. Hence, to help POS tagging for Twitter, most domain adaptation methods try to leverage newswire datasets by learning the shared features between the two domains. However, from a linguistic perspective, Twitter users not only tend to mimic the formal expressions of traditional media, like news, but they also appear to be developing linguistically informal styles. Therefore, POS tagging for the formal Twitter context can be learned together with the newswire dataset, while POS tagging for the informal Twitter context should be learned separately. To achieve this task, in this work, we propose a hypernetwork-based method to generate different parameters to separately model contexts with different expression styles. Experimental results on three different datasets show that our approach achieves better performance than state-of-the-art methods in most cases.

pdf bib
Cross-Domain Sentiment Classification with Target Domain Specific Information
Minlong Peng | Qi Zhang | Yu-gang Jiang | Xuanjing Huang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The task of adopting a model with good performance to a target domain that is different from the source domain used for training has received considerable attention in sentiment analysis. Most existing approaches mainly focus on learning representations that are domain-invariant in both the source and target domains. Few of them pay attention to domain-specific information, which should also be informative. In this work, we propose a method to simultaneously extract domain specific and invariant representations and train a classifier on each of the representation, respectively. And we introduce a few target domain labeled data for learning domain-specific information. To effectively utilize the target domain labeled data, we train the domain invariant representation based classifier with both the source and target domain labeled data and train the domain-specific representation based classifier with only the target domain labeled data. These two classifiers then boost each other in a co-training style. Extensive sentiment analysis experiments demonstrated that the proposed method could achieve better performance than state-of-the-art methods.

2017

pdf bib
Part-of-Speech Tagging for Twitter with Adversarial Neural NetworksTwitter with Adversarial Neural Networks
Tao Gui | Qi Zhang | Haoran Huang | Minlong Peng | Xuanjing Huang
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this work, we study the problem of part-of-speech tagging for Tweets. In contrast to newswire articles, Tweets are usually informal and contain numerous out-of-vocabulary words. Moreover, there is a lack of large scale labeled datasets for this domain. To tackle these challenges, we propose a novel neural network to make use of out-of-domain labeled data, unlabeled in-domain data, and labeled in-domain data. Inspired by adversarial neural networks, the proposed method tries to learn common features through adversarial discriminator. In addition, we hypothesize that domain-specific features of target domain should be preserved in some degree. Hence, the proposed method adopts a sequence-to-sequence autoencoder to perform this task. Experimental results on three different datasets show that our method achieves better performance than state-of-the-art methods.

pdf bib
Large-scale Opinion Relation Extraction with Distantly Supervised Neural Network
Changzhi Sun | Yuanbin Wu | Man Lan | Shiliang Sun | Qi Zhang
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We investigate the task of open domain opinion relation extraction. Different from works on manually labeled corpus, we propose an efficient distantly supervised framework based on pattern matching and neural network classifiers. The patterns are designed to automatically generate training data, and the deep learning model is design to capture various lexical and syntactic features. The result algorithm is fast and scalable on large-scale corpus. We test the system on the Amazon online review dataset. The result shows that our model is able to achieve promising performances without any human annotations.