Wassim El-Hajj


2019

pdf bib
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Wassim El-Hajj | Lamia Hadrich Belguith | Fethi Bougares | Walid Magdy | Imed Zitouni | Nadi Tomeh | Mahmoud El-Haj | Wajdi Zaghouani
Proceedings of the Fourth Arabic Natural Language Processing Workshop

pdf bib
Assessing Arabic Weblog Credibility via Deep Co-learningArabic Weblog Credibility via Deep Co-learning
Chadi Helwe | Shady Elbassuoni | Ayman Al Zaatari | Wassim El-Hajj
Proceedings of the Fourth Arabic Natural Language Processing Workshop

Assessing the credibility of online content has garnered a lot of attention lately. We focus on one such type of online content, namely weblogs or blogs for short. Some recent work attempted the task of automatically assessing the credibility of blogs, typically via machine learning. However, in the case of Arabic blogs, there are hardly any datasets available that can be used to train robust machine learning models for this difficult task. To overcome the lack of sufficient training data, we propose deep co-learning, a semi-supervised end-to-end deep learning approach to assess the credibility of Arabic blogs. In deep co-learning, multiple weak deep neural network classifiers are trained using a small labeled dataset, and each using a different view of the data. Each one of these classifiers is then used to classify unlabeled data, and its prediction is used to train the other classifiers in a semi-supervised fashion. We evaluate our deep co-learning approach on an Arabic blogs dataset, and we report significant improvements in performance compared to many baselines including fully-supervised deep learning models as well as ensemble models.

2017

pdf bib
Proceedings of the Third Arabic Natural Language Processing Workshop
Nizar Habash | Mona Diab | Kareem Darwish | Wassim El-Hajj | Hend Al-Khalifa | Houda Bouamor | Nadi Tomeh | Mahmoud El-Haj | Wajdi Zaghouani
Proceedings of the Third Arabic Natural Language Processing Workshop

pdf bib
CAT : Credibility Analysis of Arabic Content on TwitterCAT: Credibility Analysis of Arabic Content on Twitter
Rim El Ballouli | Wassim El-Hajj | Ahmad Ghandour | Shady Elbassuoni | Hazem Hajj | Khaled Shaban
Proceedings of the Third Arabic Natural Language Processing Workshop

Data generated on Twitter has become a rich source for various data mining tasks. Those data analysis tasks that are dependent on the tweet semantics, such as sentiment analysis, emotion mining, and rumor detection among others, suffer considerably if the tweet is not credible, not real, or spam. In this paper, we perform an extensive analysis on credibility of Arabic content on Twitter. We also build a classification model (CAT) to automatically predict the credibility of a given Arabic tweet. Of particular originality is the inclusion of features extracted directly or indirectly from the author’s profile and timeline. To train and test CAT, we annotated for credibility a data set of 9,000 Arabic tweets that are topic independent. CAT achieved consistent improvements in predicting the credibility of the tweets when compared to several baselines and when compared to the state-of-the-art approach with an improvement of 21 % in weighted average F-measure. We also conducted experiments to highlight the importance of the user-based features as opposed to the content-based features. We conclude our work with a feature reduction experiment that highlights the best indicative features of credibility.

pdf bib
A Characterization Study of Arabic Twitter Data with a Benchmarking for State-of-the-Art Opinion Mining ModelsArabic Twitter Data with a Benchmarking for State-of-the-Art Opinion Mining Models
Ramy Baly | Gilbert Badaro | Georges El-Khoury | Rawan Moukalled | Rita Aoun | Hazem Hajj | Wassim El-Hajj | Nizar Habash | Khaled Shaban
Proceedings of the Third Arabic Natural Language Processing Workshop

Opinion mining in Arabic is a challenging task given the rich morphology of the language. The task becomes more challenging when it is applied to Twitter data, which contains additional sources of noise, such as the use of unstandardized dialectal variations, the nonconformation to grammatical rules, the use of Arabizi and code-switching, and the use of non-text objects such as images and URLs to express opinion. In this paper, we perform an analytical study to observe how such linguistic phenomena vary across different Arab regions. This study of Arabic Twitter characterization aims at providing better understanding of Arabic Tweets, and fostering advanced research on the topic. Furthermore, we explore the performance of the two schools of machine learning on Arabic Twitter, namely the feature engineering approach and the deep learning approach. We consider models that have achieved state-of-the-art performance for opinion mining in English. Results highlight the advantages of using deep learning-based models, and confirm the importance of using morphological abstractions to address Arabic’s complex morphology.

pdf bib
OMAM at SemEval-2017 Task 4 : Evaluation of English State-of-the-Art Sentiment Analysis Models for Arabic and a New Topic-based ModelOMAM at SemEval-2017 Task 4: Evaluation of English State-of-the-Art Sentiment Analysis Models for Arabic and a New Topic-based Model
Ramy Baly | Gilbert Badaro | Ali Hamdi | Rawan Moukalled | Rita Aoun | Georges El-Khoury | Ahmad Al Sallab | Hazem Hajj | Nizar Habash | Khaled Shaban | Wassim El-Hajj
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

While sentiment analysis in English has achieved significant progress, it remains a challenging task in Arabic given the rich morphology of the language. It becomes more challenging when applied to Twitter data that comes with additional sources of noise including dialects, misspellings, grammatical mistakes, code switching and the use of non-textual objects to express sentiments. This paper describes the OMAM systems that we developed as part of SemEval-2017 task 4. We evaluate English state-of-the-art methods on Arabic tweets for subtask A. As for the remaining subtasks, we introduce a topic-based approach that accounts for topic specificities by predicting topics or domains of upcoming tweets, and then using this information to predict their sentiment. Results indicate that applying the English state-of-the-art method to Arabic has achieved solid results without significant enhancements. Furthermore, the topic-based method ranked 1st in subtasks C and E, and 2nd in subtask D.