Mohammed Attia


2019

pdf bib
POS Tagging for Improving Code-Switching Identification in ArabicPOS Tagging for Improving Code-Switching Identification in Arabic
Mohammed Attia | Younes Samih | Ali Elkahky | Hamdy Mubarak | Ahmed Abdelali | Kareem Darwish
Proceedings of the Fourth Arabic Natural Language Processing Workshop

When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language. This paper explores the possibility of utilizing this pattern in improving code-switching identification between Modern Standard Arabic (MSA) and Egyptian Arabic (EA). We try to answer the question of how strong is the POS signal in word-level code-switching identification. We build a deep learning model enriched with linguistic features (including POS tags) that outperforms the state-of-the-art results by 1.9 % on the development set and 1.0 % on the test set. We also show that in intra-sentential code-switching, the selection of lexical items is constrained by POS categories, where function words tend to come more often from the dialectal language while the majority of content words come from the standard language.

pdf bib
QC-GO Submission for MADAR Shared Task : Arabic Fine-Grained Dialect IdentificationQC-GO Submission for MADAR Shared Task: Arabic Fine-Grained Dialect Identification
Younes Samih | Hamdy Mubarak | Ahmed Abdelali | Mohammed Attia | Mohamed Eldesouki | Kareem Darwish
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification). In our participation in both subtasks, we explored a number of approaches and system combinations to obtain the best performance for both tasks. These include deep neural nets and heuristics. Since individual approaches suffer from various shortcomings, the combination of different approaches was able to fill some of these gaps. Our system achieves F1-Scores of 66.1 % and 67.0 % on the development sets for Subtasks 1 and 2 respectively.

2018

pdf bib
GHHT at CALCS 2018 : Named Entity Recognition for Dialectal Arabic Using Neural NetworksGHHT at CALCS 2018: Named Entity Recognition for Dialectal Arabic Using Neural Networks
Mohammed Attia | Younes Samih | Wolfgang Maier
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

This paper describes our system submission to the CALCS 2018 shared task on named entity recognition on code-switched data for the language variant pair of Modern Standard Arabic and Egyptian dialectal Arabic. We build a a Deep Neural Network that combines word and character-based representations in convolutional and recurrent networks with a CRF layer. The model is augmented with stacked layers of enriched information such pre-trained embeddings, Brown clusters and named entity gazetteers. Our system is ranked second among those participating in the shared task achieving an FB1 average of 70.09 %.

2017

pdf bib
Learning from Relatives : Unified Dialectal Arabic SegmentationArabic Segmentation
Younes Samih | Mohamed Eldesouki | Mohammed Attia | Kareem Darwish | Ahmed Abdelali | Hamdy Mubarak | Laura Kallmeyer
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Arabic dialects do not just share a common koin, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.

pdf bib
CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal DependenciesCoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

pdf bib
A Neural Architecture for Dialectal Arabic SegmentationArabic Segmentation
Younes Samih | Mohammed Attia | Mohamed Eldesouki | Ahmed Abdelali | Hamdy Mubarak | Laura Kallmeyer | Kareem Darwish
Proceedings of the Third Arabic Natural Language Processing Workshop

The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general. Segmentation of words into its constituent parts is an important processing building block. In this paper, we show how a segmenter can be trained using only 350 annotated tweets using neural networks without any normalization or use of lexical features or lexical resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that rely on additional resources.