Weiguang Qu


pdf bib
An Element-aware Multi-representation Model for Law Article Prediction
Huilin Zhong | Junsheng Zhou | Weiguang Qu | Yunfei Long | Yanhui Gu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing works have proved that using law articles as external knowledge can improve the performance of the Legal Judgment Prediction. However, they do not fully use law article information and most of the current work is only for single label samples. In this paper, we propose a Law Article Element-aware Multi-representation Model (LEMM), which can make full use of law article information and can be used for multi-label samples. The model uses the labeled elements of law articles to extract fact description features from multiple angles. It generates multiple representations of a fact for classification. Every label has a law-aware fact representation to encode more information. To capture the dependencies between law articles, the model also introduces a self-attention mechanism between multiple representations. Compared with baseline models like TopJudge, this model improves the accuracy of 5.84 %, the macro F1 of 6.42 %, and the micro F1 of 4.28 %.


pdf bib
Building a Chinese AMR Bank with Concept and Relation AlignmentsChinese AMR Bank with Concept and Relation Alignments
Bin Li | Yuan Wen | Li Song | Weiguang Qu | Nianwen Xue
Linguistic Issues in Language Technology, Volume 18, 2019 - Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing

Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts / relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71 % of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95 % of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.

pdf bib
Ellipsis in Chinese AMR CorpusChinese AMR Corpus
Yihuan Liu | Bin Li | Peiyi Yan | Li Song | Weiguang Qu
Proceedings of the First International Workshop on Designing Meaning Representations

Ellipsis is very common in language. It’s necessary for natural language processing to restore the elided elements in a sentence. However, there’s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98 % of sentences have ellipses. 92 % of the ellipses are restored by copying the antecedents’ concepts. and 12.9 % of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.