Proceedings of the 11th Global Wordnet Conference

Piek Vossen, Christiane Fellbaum (Editors)

Anthology ID:
University of South Africa (UNISA)
Global Wordnet Association
Bib Export formats:

pdf bib
Proceedings of the 11th Global Wordnet Conference
Piek Vossen | Christiane Fellbaum

pdf bib
On Universal Colexifications
Hongchang Bao | Bradley Hauer | Grzegorz Kondrak

Colexification occurs when two distinct concepts are lexified by the same word. The term covers both polysemy and homonymy. We posit and investigate the hypothesis that no pair of concepts are colexified in every language. We test our hypothesis by analyzing colexification data from BabelNet, Open Multilingual WordNet, and CLICS. The results show that our hypothesis is supported by over 99.9 % of colexified concept pairs in these three lexical resources.

pdf bib
Practical Approach on Implementation of WordNets for South African LanguagesWordNets for South African Languages
Tshephisho Joseph Sefara | Tumisho Billson Mokgonyane | Vukosi Marivate

This paper proposes the implementation of WordNets for five South African languages, namely, Sepedi, Setswana, Tshivenda, isiZulu and isiXhosa to be added to open multilingual WordNets (OMW) on natural language toolkit (NLTK). The African WordNets are converted from Princeton WordNet (PWN) 2.0 to 3.0 to match the synsets in PWN 3.0. After conversion, there were 7157, 11972, 1288, 6380, and 9460 lemmas for Sepedi, Setswana, Tshivenda, isiZulu and isiX- hosa respectively. Setswana, isiXhosa, Sepedi contains more lemmas compared to 8 languages in OMW and isiZulu contains more lemmas compared to 7 languages in OMW. A library has been published for continuous development of African WordNets in OMW using NLTK.

pdf bib
Ask2Transformers : Zero-Shot Domain labelling with Pretrained Language ModelsAsk2Transformers: Zero-Shot Domain labelling with Pretrained Language Models
Oscar Sainz | German Rigau

In this paper we present a system that exploits different pre-trained Language Models for assigning domain labels to WordNet synsets without any kind of supervision. Furthermore, the system is not restricted to use a particular set of domain labels. We exploit the knowledge encoded within different off-the-shelf pre-trained Language Models and task formulations to infer the domain label of a particular WordNet definition. The proposed zero-shot system achieves a new state-of-the-art on the English dataset used in the evaluation.

pdf bib
Monolingual Word Sense Alignment as a Classification Problem
Sina Ahmadi | John P. McCrae

Words are defined based on their meanings in various ways in different resources. Aligning word senses across monolingual lexicographic resources increases domain coverage and enables integration and incorporation of data. In this paper, we explore the application of classification methods using manually-extracted features along with representation learning techniques in the task of word sense alignment and semantic relationship detection. We demonstrate that the performance of classification methods dramatically varies based on the type of semantic relationships due to the nature of the task but outperforms the previous experiments.

pdf bib
The GlobalWordNet Formats : Updates for 2020GlobalWordNet Formats: Updates for 2020
John P. McCrae | Michael Wayne Goodman | Francis Bond | Alexandre Rademaker | Ewa Rudnicka | Luis Morgado Da Costa

The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid. As a result of their adoption, a number of shortcomings of the format were identified, and in this paper we describe the extensions to the formats that address these issues. These include : ordering of senses, dependencies between wordnets, pronunciation, syntactic modelling, relations, sense keys, metadata and RDF support. Furthermore, we provide some perspectives on how these changes help in the integration of wordnets.

pdf bib
Semantic Analysis of Verb-Noun Derivation in Princeton WordNetPrinceton WordNet
Verginica Mititelu | Svetlozara Leseva | Ivelina Stoyanova

We present here the results of a morphosemantic analysis of the verb-noun pairs in the Princeton WordNet as reflected in the standoff file containing pairs annotated with a set of 14 semantic relations. We have automatically distinguished between zero-derivation and affixal derivation in the data and identified the affixes and manually checked the results. The data show that for each semantic relation an affix prevails in creating new words, although we can not talk about their specificity with respect to such a relation. Moreover, certain pairs of verb-noun semantic primes are better represented for each semantic relation, and some semantic clusters (in the form of WordNet subtrees) take shape as a result. We thus employ a large-scale data-driven linguistically motivated analysis afforded by the rich derivational and morphosemantic description in WordNet to the end of capturing finer regularities in the process of derivation as represented in the semantic properties of the words involved and as reflected in the structure of the lexicon.

pdf bib
OdeNet : Compiling a GermanWordNet from other ResourcesOdeNet: Compiling a GermanWordNet from other Resources
Melanie Siegel | Francis Bond

The Princeton WordNet for the English language has been used worldwide in NLP projects for many years. With the OMW initiative, wordnets for different languages of the world are being linked via identifiers. The parallel development and linking allows new multilingual application perspectives. The development of a wordnet for the German language is also in this context. To save development time, existing resources were combined and recompiled. The result was then evaluated and improved. In a relatively short time a resource was created that can be used in projects and continuously improved and extended.

pdf bib
Text Document Clustering : Wordnet vs. TF-IDF vs. Word EmbeddingsWordnet vs. TF-IDF vs. Word Embeddings
Michał Marcińczuk | Mateusz Gniewkowski | Tomasz Walkowiak | Marcin Będkowski

In the paper, we deal with the problem of unsupervised text document clustering for the Polish language. Our goal is to compare the modern approaches based on language modeling (doc2vec and BERT) with the classical ones, i.e., TF-IDF and wordnet-based. The experiments are conducted on three datasets containing qualification descriptions. The experiments’ results showed that wordnet-based similarity measures could compete and even outperform modern embedding-based approaches.

pdf bib
Neural Language Models vs Wordnet-based Semantically Enriched Representation in CST Relation RecognitionWordnet-based Semantically Enriched Representation in CST Relation Recognition
Arkadiusz Janz | Maciej Piasecki | Piotr Wątorski

Neural language models, including transformer-based models, that are pre-trained on very large corpora became a common way to represent text in various tasks, including recognition of textual semantic relations, e.g. Cross-document Structure Theory. Pre-trained models are usually fine tuned to downstream tasks and the obtained vectors are used as an input for deep neural classifiers. No linguistic knowledge obtained from resources and tools is utilised. In this paper we compare such universal approaches with a combination of rich graph-based linguistically motivated sentence representation and a typical neural network classifier applied to a task of recognition of CST relation in Polish. The representation describes selected levels of the sentence structure including description of lexical meanings on the basis of the wordnet (plWordNet) synsets and connected SUMO concepts. The obtained results show that in the case of difficult relations and medium size training corpus semantically enriched text representation leads to significantly better results.

pdf bib
What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE CorpusWordNet? A Preliminary Analysis on the TwitterAAE Corpus
Cecilia Domingo | Tatiana Gonzalez-Ferrero | Itziar Gonzalez-Dios

Natural Language Processing tools and resources have been so far mainly created and trained for standard varieties of language. Nowadays, with the use of large amounts of data gathered from social media, other varieties and registers need to be processed, which may present other challenges and difficulties. In this work, we focus on English and we present a preliminary analysis by comparing the TwitterAAE corpus, which is annotated for ethnicity, and WordNet by quantifying and explaining the online language that WordNet misses.

pdf bib
Toward the creation of WordNets for ancient Indo-European languagesWordNets for ancient Indo-European languages
Erica Biagetti | Chiara Zanchi | William Michael Short

This paper presents the work in progress toward the creation of a family of WordNets for Sanskrit, Ancient Greek, and Latin. Building on previous attempts in the field, we elaborate these efforts bridging together WordNet relational semantics with theories of meaning from Cognitive Linguistics. We discuss some of the innovations we have introduced to the WordNet architecture, to better capture the polysemy of words, as well as Indo-European language family-specific features. We conclude the paper framing our work within the larger picture of resources available for ancient languages and showing that WordNet-backed search tools have the potential to re-define the kinds of questions that can be asked of ancient language corpora.

pdf bib
Teaching Through Tagging Interactive Lexical Semantics
Francis Bond | Andrew Devadason | Melissa Rui Lin Teo | Luís Morgado da Costa

In this paper we discuss an ongoing effort to enrich students’ learning by involving them in sense tagging. The main goal is to lead students to discover how we can represent meaning and where the limits of our current theories lie. A subsidiary goal is to create sense tagged corpora and an accompanying linked lexicon (in our case wordnets). We present the results of tagging several texts and suggest some ways in which the tagging process could be improved. Two authors of this paper present their own experience as students. Overall, students reported that they found the tagging an enriching experience. The annotated corpora and changes to the wordnet are made available through the NTU multilingual corpus and associated wordnets (NTU-MC).