Nathan Schneider


2021

pdf bib
Lexical Semantic Recognition
Nelson F. Liu | Daniel Hershcovich | Michael Kranzlein | Nathan Schneider
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence. We hypothesize that a unified lexical semantic recognition task is an effective way to encapsulate previously disparate styles of annotation, including multiword expression identification / classification and supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence tagger and evaluate its performance along various axes of annotation. As the label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally evaluate how well the model generalizes to those test sets, finding that it approaches or surpasses existing models despite training only on STREUSLE. Our work also establishes baseline models and evaluation metrics for integrated and accurate modeling of lexical semantics, facilitating future work in this area.

pdf bib
Referenceless Parsing-Based Evaluation of AMR-to-English GenerationAMR-to-English Generation
Emma Manning | Nathan Schneider
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

Reference-based automatic evaluation metrics are notoriously limited for NLG due to their inability to fully capture the range of possible outputs. We examine a referenceless alternative : evaluating the adequacy of English sentences generated from Abstract Meaning Representation (AMR) graphs by parsing into AMR and comparing the parse directly to the input. We find that the errors introduced by automatic AMR parsing substantially limit the effectiveness of this approach, but a manual editing study indicates that as parsing improves, parsing-based evaluation has the potential to outperform most reference-based metrics.

pdf bib
A Balanced and Broadly Targeted Computational Linguistics Curriculum
Emma Manning | Nathan Schneider | Amir Zeldes
Proceedings of the Fifth Workshop on Teaching NLP

This paper describes the primarily-graduate computational linguistics and NLP curriculum at Georgetown University, a U.S. university that has seen significant growth in these areas in recent years. We reflect on the principles behind our curriculum choices, including recognizing the various academic backgrounds and goals of our students ; teaching a variety of skills with an emphasis on working directly with data ; encouraging collaboration and interdisciplinary work ; and including languages beyond English. We reflect on challenges we have encountered, such as the difficulty of teaching programming skills alongside NLP fundamentals, and discuss areas for future growth.

pdf bib
Classifying Divergences in Cross-lingual AMR PairsAMR Pairs
Shira Wein | Nathan Schneider
Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

Translation divergences are varied and widespread, challenging approaches that rely on parallel text. To annotate translation divergences, we propose a schema grounded in the Abstract Meaning Representation (AMR), a sentence-level semantic framework instantiated for a number of languages. By comparing parallel AMR graphs, we can identify specific points of divergence. Each divergence is labeled with both a type and a cause. We release a small corpus of annotated English-Spanish data, and analyze the annotations in our corpus.

pdf bib
Subcategorizing Adverbials in Universal Conceptual Cognitive AnnotationUniversal Conceptual Cognitive Annotation
Zhuxin Wang | Jakob Prange | Nathan Schneider
Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

Universal Conceptual Cognitive Annotation (UCCA) is a semantic annotation scheme that organizes texts into coarse predicate-argument structure, offering broad coverage of semantic phenomena. At the same time, there is still need for a finer-grained treatment of many of the categories. The Adverbial category is of special interest, as it covers a wide range of fundamentally different meanings such as negation, causation, aspect, and event quantification. In this paper we introduce a refinement annotation scheme for UCCA’s Adverbial category, showing that UCCA Adverbials can indeed be subcategorized into at least 7 semantic types, and doing so can help clarify and disambiguate the otherwise coarse-grained labels. We provide a preliminary set of annotation guidelines, as well as pilot annotation experiments with high inter-annotator agreement, confirming the validity of the scheme.

pdf bib
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Alexis Palmer | Nathan Schneider | Natalie Schluter | Guy Emerson | Aurelie Herbelot | Xiaodan Zhu
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

2020

pdf bib
K-SNACS : Annotating Korean Adposition SemanticsSNACS: Annotating Korean Adposition Semantics
Jena D. Hwang | Hanwool Choe | Na-Rae Han | Nathan Schneider
Proceedings of the Second International Workshop on Designing Meaning Representations

While many languages use adpositions to encode semantic relationships between content words in a sentence (e.g., agentivity or temporality), the details of how adpositions work vary widely across languages with respect to both form and meaning. In this paper, we empirically adapt the SNACS framework (Schneider et al., 2018) to Korean, a language that is typologically distant from Englishthe language SNACS was based on. We apply the SNACS framework to annotate the highly popular novellaThe Little Prince with semantic supersense labels over allKorean postpositions. Thus, we introduce the first broad-coverage corpus annotated with Korean postposition semantics and provide a detailed analysis of the corpus with an apples-to-apples comparison between Korean and English annotations

pdf bib
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Aurelie Herbelot | Xiaodan Zhu | Alexis Palmer | Nathan Schneider | Jonathan May | Ekaterina Shutova
Proceedings of the Fourteenth Workshop on Semantic Evaluation

pdf bib
Comparison by Conversion : Reverse-Engineering UCCA from Syntax and Lexical SemanticsUCCA from Syntax and Lexical Semantics
Daniel Hershcovich | Nathan Schneider | Dotan Dvir | Jakob Prange | Miryam de Lhoneux | Omri Abend
Proceedings of the 28th International Conference on Computational Linguistics

Building robust natural language understanding systems will require a clear characterization of whether and how various linguistic meaning representations complement each other. To perform a systematic comparative analysis, we evaluate the mapping between meaning representations from different frameworks using two complementary methods : (i) a rule-based converter, and (ii) a supervised delexicalized parser that parses to one framework using only information from the other as features. We apply these methods to convert the STREUSLE corpus (with syntactic and lexical semantic annotations) to UCCA (a graph-structured full-sentence meaning representation). Both methods yield surprisingly accurate target representations, close to fully supervised UCCA parser qualityindicating that UCCA annotations are partially redundant with STREUSLE annotations. Despite this substantial convergence between frameworks, we find several important areas of divergence.

pdf bib
A Human Evaluation of AMR-to-English Generation SystemsAMR-to-English Generation Systems
Emma Manning | Shira Wein | Nathan Schneider
Proceedings of the 28th International Conference on Computational Linguistics

Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation. In this work, we present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types, for several recent AMR generation systems. We discuss the relative quality of these systems and how our results compare to those of automatic metrics, finding that while the metrics are mostly successful in ranking systems overall, collecting human judgments allows for more nuanced comparisons. We also analyze common errors made by these systems.

pdf bib
PASTRIE : A Corpus of Prepositions Annotated with Supersense Tags in Reddit International EnglishPASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English
Michael Kranzlein | Emma Manning | Siyao Peng | Shira Wein | Aryaman Arora | Nathan Schneider
Proceedings of the 14th Linguistic Annotation Workshop

We present the Prepositions Annotated with Supsersense Tags in Reddit International English (PASTRIE) corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s : English, French, German, and Spanish. The annotations are comprehensive, covering all preposition types and tokens in the sample. Along with the corpus, we provide analysis of distributional patterns across the included L1s and a discussion of the influence of L1s on L2 preposition choice.

2019

pdf bib
An Improved Approach for Semantic Graph Composition with CCGCCG
Austin Blodgett | Nathan Schneider
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

This paper builds on previous work using Combinatory Categorial Grammar (CCG) to derive a transparent syntax-semantics interface for Abstract Meaning Representation (AMR) parsing. We define new semantics for the CCG combinators that is better suited to deriving AMR graphs. In particular, we define relation-wise alternatives for the application and composition combinators : these require that the two constituents being combined overlap in one AMR relation. We also provide a new semantics for type raising, which is necessary for certain constructions. Using these mechanisms, we suggest an analysis of eventive nouns, which present a challenge for deriving AMR graphs. Our theoretical analysis will facilitate future work on robust and transparent AMR parsing using CCG.

pdf bib
Preparing SNACS for Subjects and ObjectsSNACS for Subjects and Objects
Adi Shalev | Jena D. Hwang | Nathan Schneider | Vivek Srikumar | Omri Abend | Ari Rappoport
Proceedings of the First International Workshop on Designing Meaning Representations

Research on adpositions and possessives in multiple languages has led to a small inventory of general-purpose meaning classes that disambiguate tokens. Importantly, that work has argued for a principled separation of the semantic role in a scene from the function coded by morphosyntax. Here, we ask whether this approach can be generalized beyond adpositions and possessives to cover all scene participantsincluding subjects and objectsdirectly, without reference to a frame lexicon. We present new guidelines for English and the results of an interannotator agreement study.

pdf bib
Semantically Constrained Multilayer Annotation : The Case of Coreference
Jakob Prange | Nathan Schneider | Omri Abend
Proceedings of the First International Workshop on Designing Meaning Representations

We propose a coreference annotation scheme as a layer on top of the Universal Conceptual Cognitive Annotation foundational layer, treating units in predicate-argument structure as a basis for entity and event mentions. We argue that this allows coreference annotators to sidestep some of the challenges faced in other schemes, which do not enforce consistency with predicate-argument structure and vary widely in what kinds of mentions they annotate and how. The proposed approach is examined with a pilot annotation study and compared with annotations from other schemes.

pdf bib
Made for Each Other : Broad-Coverage Semantic Structures Meet Preposition Supersenses
Jakob Prange | Nathan Schneider | Omri Abend
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Universal Conceptual Cognitive Annotation (UCCA ; Abend and Rappoport, 2013) is a typologically-informed, broad-coverage semantic annotation scheme that describes coarse-grained predicate-argument structure but currently lacks semantic roles. We argue that lexicon-free annotation of the semantic roles marked by prepositions, as formulated by Schneider et al. (2018), is complementary and suitable for integration within UCCA. We show empirically for English that the schemes, though annotated independently, are compatible and can be combined in a single semantic graph. A comparison of several approaches to parsing the integrated representation lays the groundwork for future research on this task.

2018

pdf bib
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Agata Savary | Carlos Ramisch | Jena D. Hwang | Nathan Schneider | Melanie Andresen | Sameer Pradhan | Miriam R. L. Petruck
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

pdf bib
Annotation of Tense and Aspect Semantics for Sentential AMRAMR
Lucia Donatelli | Michael Regan | William Croft | Nathan Schneider
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

Although English grammar encodes a number of semantic contrasts with tense and aspect marking, these semantics are currently ignored by Abstract Meaning Representation (AMR) annotations. This paper extends sentence-level AMR to include a coarse-grained treatment of tense and aspect semantics. The proposed framework augments the representation of finite predications to include a four-way temporal distinction (event time before, up to, at, or after speech time) and several aspectual distinctions (including static vs. dynamic, habitual vs. episodic, and telic vs. atelic). This will enable AMR to be used for NLP tasks and applications that require sophisticated reasoning about time and event structure.

pdf bib
Constructing an Annotated Corpus of Verbal MWEs for EnglishMWEs for English
Abigail Walsh | Claire Bonial | Kristina Geeraert | John P. McCrae | Nathan Schneider | Clarissa Somers
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the construction and annotation of a corpus of verbal MWEs for English, as part of the PARSEME Shared Task 1.1 on automatic identification of verbal MWEs. The criteria for corpus selection, the categories of MWEs used, and the training process are discussed, along with the particular issues that led to revisions in edition 1.1 of the annotation guidelines. Finally, an overview of the characteristics of the final annotated corpus is presented, as well as some discussion on inter-annotator agreement.

pdf bib
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword ExpressionsPARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.

pdf bib
Parsing Tweets into Universal DependenciesUniversal Dependencies
Yijia Liu | Yi Zhu | Wanxiang Che | Bing Qin | Nathan Schneider | Noah A. Smith
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We study the problem of analyzing tweets with universal dependencies (UD). We extend the UD guidelines to cover special constructions in tweets that affect tokenization, part-of-speech tagging, and labeled dependencies. Using the extended guidelines, we create a new tweet treebank for English (Tweebank v2) that is four times larger than the (unlabeled) Tweebank v1 introduced by Kong et al. We characterize the disagreements between our annotators and show that it is challenging to deliver consistent annotation due to ambiguity in understanding and explaining tweets. Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD. To overcome the annotation noise without sacrificing computational efficiency, we propose a new method to distill an ensemble of 20 transition-based parsers into a single one. Our parser achieves an improvement of 2.2 in LAS over the un-ensembled baseline and outperforms parsers that are state-of-the-art on other treebanks in both accuracy and speed.

2017

pdf bib
Proceedings of the 11th Linguistic Annotation Workshop
Nathan Schneider | Nianwen Xue
Proceedings of the 11th Linguistic Annotation Workshop

pdf bib
Double Trouble : The Problem of Construal in Semantic Annotation of Adpositions
Jena D. Hwang | Archna Bhatia | Na-Rae Han | Tim O’Gorman | Vivek Srikumar | Nathan Schneider
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 word corpus of English. Attempts to apply the scheme to adpositions and case markers in other languages, as well as some problematic cases in English, have led us to reconsider the assumption that an adposition’s lexical contribution is equivalent to the role / relation that it mediates. Our proposal is to embrace the potential for construal in adposition use, expressing such phenomena directly at the token level to manage complexity and avoid sense proliferation. We suggest a framework to represent both the scene role and the adposition’s lexical function so they can be annotated at scalesupporting automatic, statistical processing of domain-general languageand discuss how this representation would allow for a simpler inventory of labels.

pdf bib
The NLTK FrameNet API : Designing for Discoverability with a Rich Linguistic ResourceNLTK FrameNet API: Designing for Discoverability with a Rich Linguistic Resource
Nathan Schneider | Chuck Wooters
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

A new Python API, integrated within the NLTK suite, offers access to the FrameNet 1.7 lexical database. The lexicon (structured in terms of frames) as well as annotated sentences can be processed programatically, or browsed with human-readable displays via the interactive Python prompt.