Proceedings of the Third Workshop on Computational Typology and Multilingual NLP

Ekaterina Vylomova, Elizabeth Salesky, Sabrina Mielke, Gabriella Lapesa, Ritesh Kumar, Harald Hammarström, Ivan Vulić, Anna Korhonen, Roi Reichart, Edoardo Maria Ponti, Ryan Cotterell (Editors)


Anthology ID:
2021.sigtyp-1
Month:
June
Year:
2021
Address:
Online
Venues:
NAACL | SIGTYP
SIG:
SIGTYP
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2021.sigtyp-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/2021.sigtyp-1.pdf

pdf bib
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Ekaterina Vylomova | Elizabeth Salesky | Sabrina Mielke | Gabriella Lapesa | Ritesh Kumar | Harald Hammarström | Ivan Vulić | Anna Korhonen | Roi Reichart | Edoardo Maria Ponti | Ryan Cotterell

pdf bib
OTEANN : Estimating the Transparency of Orthographies with an Artificial Neural NetworkOTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network
Xavier Marjou

To transcribe spoken language to written medium, most alphabets enable an unambiguous sound-to-letter rule. However, some writing systems have distanced themselves from this simple concept and little work exists in Natural Language Processing (NLP) on measuring such distance. In this study, we use an Artificial Neural Network (ANN) model to evaluate the transparency between written words and their pronunciation, hence its name Orthographic Transparency Estimation with an ANN (OTEANN). Based on datasets derived from Wikimedia dictionaries, we trained and tested this model to score the percentage of false predictions in phoneme-to-grapheme and grapheme-to-phoneme translation tasks. The scores obtained on 17 orthographies were in line with the estimations of other studies. Interestingly, the model also provided insight into typical mistakes made by learners who only consider the phonemic rule in reading and writing.

pdf bib
Improving Cross-Lingual Sentiment Analysis via Conditional Language Adversarial Nets
Hemanth Kandula | Bonan Min

Sentiment analysis has come a long way for high-resource languages due to the availability of large annotated corpora. However, it still suffers from lack of training data for low-resource languages. To tackle this problem, we propose Conditional Language Adversarial Network (CLAN), an end-to-end neural architecture for cross-lingual sentiment analysis without cross-lingual supervision. CLAN differs from prior work in that it allows the adversarial training to be conditioned on both learned features and the sentiment prediction, to increase discriminativity for learned representation in the cross-lingual setting. Experimental results demonstrate that CLAN outperforms previous methods on the multilingual multi-domain Amazon review dataset. Our source code is released at https://github.com/hemanthkandula/clan.

pdf bib
Anlirika : An LSTMCNN Flow Twister for Spoken Language IdentificationLSTMCNN Flow Twister for Spoken Language Identification
Andreas Scherbakov | Liam Whittle | Ritesh Kumar | Siddharth Singh | Matthew Coleman | Ekaterina Vylomova

The paper presents Anlirika’s submission to SIGTYP 2021 Shared Task on Robust Spoken Language Identification. The task aims at building a robust system that generalizes well across different domains and speakers. The training data is limited to a single domain only with predominantly single speaker per language while the validation and test data samples are derived from diverse dataset and multiple speakers. We experiment with a neural system comprising a combination of dense, convolutional, and recurrent layers that are designed to perform better generalization and obtain speaker-invariant representations. We demonstrate that the task in its constrained form (without making use of external data or augmentation the train set with samples from the validation set) is still challenging. Our best system trained on the data augmented with validation samples achieves 29.9 % accuracy on the test data.