Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

Garrett Nicolai, Ryan Cotterell (Editors)


Anthology ID:
W19-42
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | WS
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/W19-42
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/W19-42.pdf

pdf bib
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Ryan Cotterell

pdf bib
AX Semantics’ Submission to the SIGMORPHON 2019 Shared TaskAX Semantics’ Submission to the SIGMORPHON 2019 Shared Task
Andreas Madsack | Robert Weißgraeber

This paper describes the AX Semantics’ submission to the SIGMORPHON 2019 shared task on morphological reinflection. We implemented two systems, both tackling the task for all languages in one codebase, without any underlying language specific features. The first one is an encoder-decoder model using AllenNLP ; the second system uses the same model modified by a custom trainer that trains only with the target language resources after a specific threshold. We especially focused on building an implementation using AllenNLP with out-of-the-box methods to facilitate easy operation and reuse.

pdf bib
ITIST at the SIGMORPHON 2019 Shared Task : Sparse Two-headed Models for InflectionITIST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection
Ben Peters | André F. T. Martins

This paper presents the Instituto de TelecomunicaesInstituto Superior Tcnico submission to Task 1 of the SIGMORPHON 2019 Shared Task. Our models combine sparse sequence-to-sequence models with a two-headed attention mechanism that learns separate attention distributions for the lemma and inflectional tags. Among submissions to Task 1, our models rank second and third. Despite the low data setting of the task (only 100 in-language training examples), they learn plausible inflection patterns and often concentrate all probability mass into a small set of hypotheses, making beam search exact.

pdf bib
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in MorphologyCMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
Aditi Chaudhary | Elizabeth Salesky | Gayatri Bhat | David R. Mortensen | Jaime Carbonell | Yulia Tsvetkov

This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context. This task requires us to produce the lemma and morpho-syntactic description of each token in a sequence, for 107 treebanks. We approach this task with a hierarchical neural conditional random field (CRF) model which predicts each coarse-grained feature (eg. POS, Case, etc.) independently. However, most treebanks are under-resourced, thus making it challenging to train deep neural models for them. Hence, we propose a multi-lingual transfer training regime where we transfer from multiple related languages that share similar typology.

pdf bib
THOMAS : The Hegemonic OSU Morphological Analyzer using Seq2seqTHOMAS: The Hegemonic OSU Morphological Analyzer using Seq2seq
Byung-Doh Oh | Pranav Maneriker | Nanjiang Jiang

This paper describes the OSU submission to the SIGMORPHON 2019 shared task, Crosslinguality and Context in Morphology. Our system addresses the contextual morphological analysis subtask of Task 2, which is to produce the morphosyntactic description (MSD) of each fully inflected word within a given sentence. We frame this as a sequence generation task and employ a neural encoder-decoder (seq2seq) architecture to generate the sequence of MSD tags given the encoded representation of each token. Follow-up analyses reveal that our system most significantly improves performance on morphologically complex languages whose inflected word forms typically have longer MSD tag sequences. In addition, our system seems to capture the structured correlation between MSD tags, such as that between the verb tag and TAM-related tags.contextual morphological analysis subtask of Task 2, which is to produce the morphosyntactic description (MSD) of each fully inflected word within a given sentence. We frame this as a sequence generation task and employ a neural encoder-decoder (seq2seq) architecture to generate the sequence of MSD tags given the encoded representation of each token. Follow-up analyses reveal that our system most significantly improves performance on morphologically complex languages whose inflected word forms typically have longer MSD tag sequences. In addition, our system seems to capture the structured correlation between MSD tags, such as that between the “verb” tag and TAM-related tags.

pdf bib
Sigmorphon 2019 Task 2 system description paper : Morphological analysis in context for many languages, with supervision from only a few
Brad Aiken | Jared Kelly | Alexis Palmer | Suleyman Olcay Polat | Taraka Rama | Rodney Nielsen

This paper presents the UNT HiLT+Ling system for the Sigmorphon 2019 shared Task 2 : Morphological Analysis and Lemmatization in Context. Our core approach focuses on the morphological tagging task ; part-of-speech tagging and lemmatization are treated as secondary tasks. Given the highly multilingual nature of the task, we propose an approach which makes minimal use of the supplied training data, in order to be extensible to languages without labeled training data for the morphological inflection task. Specifically, we use a parallel Bible corpus to align contextual embeddings at the verse level. The aligned verses are used to build cross-language translation matrices, which in turn are used to map between embedding spaces for the various languages. Finally, we use sets of inflected forms, primarily from a high-resource language, to induce vector representations for individual UniMorph tags. Morphological analysis is performed by matching vector representations to embeddings for individual tokens. While our system results are dramatically below the average system submitted for the shared task evaluation campaign, our method is (we suspect) unique in its minimal reliance on labeled training data.

pdf bib
UDPipe at SIGMORPHON 2019 : Contextualized Embeddings, Regularization with Morphological Categories, Corpora MergingUDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging
Milan Straka | Jana Straková | Jan Hajic

We present our contribution to the SIGMORPHON 2019 Shared Task : Crosslinguality and Context in Morphology, Task 2 : contextual morphological analysis and lemmatization. We submitted a modification of the UDPipe 2.0, one of best-performing systems of the CoNLL 2018 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies and an overall winner of the The 2018 Shared Task on Extrinsic Parser Evaluation. As our first improvement, we use the pretrained contextualized embeddings (BERT) as additional inputs to the network ; secondly, we use individual morphological features as regularization ; and finally, we merge the selected corpora of the same language. In the lemmatization task, our system exceeds all the submitted systems by a wide margin with lemmatization accuracy 95.78 (second best was 95.00, third 94.46). In the morphological analysis, our system placed tightly second : our morphological analysis accuracy was 93.19, the winning system’s 93.23.

pdf bib
A Little Linguistics Goes a Long Way : Unsupervised Segmentation with Limited Language Specific Guidance
Alexander Erdmann | Salam Khalifa | Mai Oudah | Nizar Habash | Houda Bouamor

We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input. Our technique involves creating a small grammar of closed-class affixes which can be written in a few hours. The grammar over generates analyses for word forms attested in a raw corpus which are disambiguated based on features of the linguistic base proposed for each form. Extending the grammar to cover orthographic, morpho-syntactic or lexical variation is simple, making it an ideal solution for challenging corpora with noisy, dialect-inconsistent, or otherwise non-standard content. In two evaluations, we consistently outperform competitive unsupervised baselines and approach the performance of state-of-the-art supervised models trained on large amounts of data, providing evidence for the value of linguistic input during preprocessing.

pdf bib
Equiprobable mappings in weighted constraint grammars
Arto Anttila | Scott Borgeson | Giorgio Magri

We show that MaxEnt is so rich that it can distinguish between any two different mappings : there always exists a nonnegative weight vector which assigns them different MaxEnt probabilities. Stochastic HG instead does admit equiprobable mappings and we give a complete formal characterization of them.

pdf bib
Unbounded Stress in Subregular Phonology
Yiding Hao | Samuel Andersson

This paper situates culminative unbounded stress systems within the subregular hierarchy for functions. While Baek (2018) has argued that such systems can be uniformly understood as input tier-based strictly local constraints, we show here that default-to-opposite-side and default-to-same-side stress systems belong to distinct subregular classes when they are viewed as functions that assign primary stress to underlying forms. While the former system can be captured by input tier-based input strictly local functions, a subsequential function class that we define here, the latter system is not subsequential, though it is weakly deterministic according to McCollum et al.’s (2018) non-interaction criterion. Our results motivate the extension of recently proposed subregular language classes to subregular functions and argue in favor of McCollum et al’s definition of weak determinism over that of Heinz and Lai (2013).

pdf bib
Convolutional neural networks for low-resource morpheme segmentation : baseline or state-of-the-art?
Alexey Sorokin

We apply convolutional neural networks to the task of shallow morpheme segmentation using low-resource datasets for 5 different languages. We show that both in fully supervised and semi-supervised settings our model beats previous state-of-the-art approaches. We argue that convolutional neural networks reflect local nature of morpheme segmentation better than other semi-supervised approaches.

pdf bib
Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages
Ramy Eskander | Judith Klavans | Smaranda Muresan

Polysynthetic languages pose a challenge for morphological analysis due to the root-morpheme complexity and to the word class squish. In addition, many of these polysynthetic languages are low-resource. We propose unsupervised approaches for morphological segmentation of low-resource polysynthetic languages based on Adaptor Grammars (AG) (Eskander et al., 2016). We experiment with four languages from the Uto-Aztecan family. Our AG-based approaches outperform other unsupervised approaches and show promise when compared to supervised methods, outperforming them on two of the four languages.

pdf bib
The SIGMORPHON 2019 Shared Task : Morphological Analysis in Context and Cross-Lingual Transfer for InflectionSIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
Arya D. McCarthy | Ekaterina Vylomova | Shijie Wu | Chaitanya Malaviya | Lawrence Wolf-Sonkin | Garrett Nicolai | Christo Kirov | Miikka Silfverberg | Sabrina J. Mielke | Jeffrey Heinz | Ryan Cotterell | Mans Hulden

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years’ inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year’s strong baselines or highly ranked systems from previous years’ shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines.