Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Alexandra Balahur, Saif M. Mohammad, Veronique Hoste, Roman Klinger (Editors)


Anthology ID:
W18-62
Month:
October
Year:
2018
Address:
Brussels, Belgium
Venues:
EMNLP | WASSA | WS
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/W18-62
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/W18-62.pdf

pdf bib
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Saif M. Mohammad | Veronique Hoste | Roman Klinger

pdf bib
Identifying Affective Events and the Reasons for their Polarity
Ellen Riloff

Many events have a positive or negative impact on our lives (e.g., I bought a house is typically good news, but My house burned down is bad news). Recognizing events that have affective polarity is essential for narrative text understanding, conversational dialogue, and applications such as summarization and sarcasm detection. We will discuss our recent work on identifying affective events and categorizing them based on the underlying reasons for their affective polarity. First, we will describe a weakly supervised learning method to induce a large set of affective events from a text corpus by optimizing for semantic consistency. Second, we will present models to classify affective events based on Human Need Categories, which often explain people’s motivations and desires. Our best results use a co-training model that consists of event expression and event context classifiers and exploits both labeled and unlabeled texts. We will conclude with a discussion of interesting directions for future work in this area.

pdf bib
Deep contextualized word representations for detecting sarcasm and irony
Suzana Ilić | Edison Marrese-Taylor | Jorge Balazs | Yutaka Matsuo

Predicting context-dependent and non-literal utterances like sarcastic and ironic expressions still remains a challenging task in NLP, as it goes beyond linguistic patterns, encompassing common sense and shared knowledge as crucial components. To capture complex morpho-syntactic features that can usually serve as indicators for irony or sarcasm across dynamic contexts, we propose a model that uses character-level vector representations of words, based on ELMo. We test our model on 7 different datasets derived from 3 different data sources, providing state-of-the-art performance in 6 of them, and otherwise offering competitive results.

pdf bib
Implicit Subjective and Sentimental Usages in Multi-sense Word Embeddings
Yuqi Sun | Haoyue Shi | Junfeng Hu

In multi-sense word embeddings, contextual variations in corpus may cause a univocal word to be embedded into different sense vectors. Shi et al. (2016) show that this kind of pseudo multi-senses can be eliminated by linear transformations. In this paper, we show that pseudo multi-senses may come from a uniform and meaningful phenomenon such as subjective and sentimental usage, though they are seemingly redundant. In this paper, we present an unsupervised algorithm to find a linear transformation which can minimize the transformed distance of a group of sense pairs. The major shrinking direction of this transformation is found to be related with subjective shift. Therefore, we can not only eliminate pseudo multi-senses in multisense embeddings, but also identify these subjective senses and tag the subjective and sentimental usage of words in the corpus automatically.pseudo multi-senses can be eliminated by linear transformations. In this paper, we show that pseudo multi-senses may come from a uniform and meaningful phenomenon such as subjective and sentimental usage, though they are seemingly redundant. In this paper, we present an unsupervised algorithm to find a linear transformation which can minimize the transformed distance of a group of sense pairs. The major shrinking direction of this transformation is found to be related with subjective shift. Therefore, we can not only eliminate pseudo multi-senses in multisense embeddings, but also identify these subjective senses and tag the subjective and sentimental usage of words in the corpus automatically.

pdf bib
Amobee at IEST 2018 : Transfer Learning from Language ModelsAmobee at IEST 2018: Transfer Learning from Language Models
Alon Rozental | Daniel Fleischer | Zohar Kelrich

This paper describes the system developed at Amobee for the WASSA 2018 implicit emotions shared task (IEST). The goal of this task was to predict the emotion expressed by missing words in tweets without an explicit mention of those words. We developed an ensemble system consisting of language models together with LSTM-based networks containing a CNN attention mechanism. Our approach represents a novel use of language modelsspecifically trained on a large Twitter datasetto predict and classify emotions. Our system reached 1st place with a macro F1 score of 0.7145.

pdf bib
IIIDYT at IEST 2018 : Implicit Emotion Classification With Deep Contextualized Word RepresentationsIIIDYT at IEST 2018: Implicit Emotion Classification With Deep Contextualized Word Representations
Jorge Balazs | Edison Marrese-Taylor | Yutaka Matsuo

In this paper we describe our system designed for the WASSA 2018 Implicit Emotion Shared Task (IEST), which obtained 2nd place out of 30 teams with a test macro F1 score of 0.710. The system is composed of a single pre-trained ELMo layer for encoding words, a Bidirectional Long-Short Memory Network BiLSTM for enriching word representations with context, a max-pooling operation for creating sentence representations from them, and a Dense Layer for projecting the sentence representations into label space. Our official submission was obtained by ensembling 6 of these models initialized with different random seeds. The code for replicating this paper is available at.https://github.com/jabalazs/implicit_emotion.

pdf bib
Not Just Depressed : Bipolar Disorder Prediction on RedditReddit
Ivan Sekulic | Matej Gjurković | Jan Šnajder

Bipolar disorder, an illness characterized by manic and depressive episodes, affects more than 60 million people worldwide. We present a preliminary study on bipolar disorder prediction from user-generated text on Reddit, which relies on users’ self-reported labels. Our benchmark classifiers for bipolar disorder prediction outperform the baselines and reach accuracy and F1-scores of above 86 %. Feature analysis shows interesting differences in language use between users with bipolar disorders and the control group, including differences in the use of emotion-expressive words.

pdf bib
Topic-Specific Sentiment Analysis Can Help Identify Political Ideology
Sumit Bhatia | Deepak P

Ideological leanings of an individual can often be gauged by the sentiment one expresses about different issues. We propose a simple framework that represents a political ideology as a distribution of sentiment polarities towards a set of topics. This representation can then be used to detect ideological leanings of documents (speeches, news articles, etc.) based on the sentiments expressed towards different topics. Experiments performed using a widely used dataset show the promise of our proposed approach that achieves comparable performance to other methods despite being much simpler and more interpretable.

pdf bib
Saying no but meaning yes : negation and sentiment analysis in BasqueBasque
Jon Alkorta | Koldo Gojenola | Mikel Iruskieta

In this work, we have analyzed the effects of negation on the semantic orientation in Basque. The analysis shows that negation markers can strengthen, weaken or have no effect on sentiment orientation of a word or a group of words. Using the Constraint Grammar formalism, we have designed and evaluated a set of linguistic rules to formalize these three phenomena. The results show that two phenomena, strengthening and no change, have been identified accurately and the third one, weakening, with acceptable results.

pdf bib
Leveraging Writing Systems Change for Deep Learning Based Chinese Emotion AnalysisChinese Emotion Analysis
Rong Xiang | Yunfei Long | Qin Lu | Dan Xiong | I-Hsuan Chen

Social media text written in Chinese communities contains mixed scripts including major text written in Chinese, an ideograph-based writing system, and some minor text using Latin letters, an alphabet-based writing system. This phenomenon is called writing systems changes (WSCs). Past studies have shown that WSCs can be used to express emotions, particularly where the social and political environment is more conservative. However, because WSCs can break the syntax of the major text, it poses more challenges in Natural Language Processing (NLP) tasks like emotion classification. In this work, we present a novel deep learning based method to include WSCs as an effective feature for emotion analysis. The method first identifies all WSCs points. Then representation of the major text is learned through an LSTM model whereas the minor text is learned by a separate CNN model. Emotions in the minor text are further highlighted through an attention mechanism before emotion classification. Performance evaluation shows that incorporating WSCs features using deep learning models can improve performance measured by F1-scores compared to the state-of-the-art model.

pdf bib
Ternary Twitter Sentiment Classification with Distant Supervision and Sentiment-Specific Word EmbeddingsTwitter Sentiment Classification with Distant Supervision and Sentiment-Specific Word Embeddings
Mats Byrkjeland | Frederik Gørvell de Lichtenberg | Björn Gambäck

The paper proposes the Ternary Sentiment Embedding Model, a new model for creating sentiment embeddings based on the Hybrid Ranking Model of Tang et al. (2016), but trained on ternary-labeled data instead of binary-labeled, utilizing sentiment embeddings from datasets made with different distant supervision methods. The model is used as part of a complete Twitter Sentiment Analysis system and empirically compared to existing systems, showing that it outperforms Hybrid Ranking and that the quality of the distant-supervised dataset has a great impact on the quality of the produced sentiment embeddings.

pdf bib
Linking News Sentiment to Microblogs : A Distributional Semantics Approach to Enhance Microblog Sentiment Classification
Tobias Daudert | Paul Buitelaar

Social media’s popularity in society and research is gaining momentum and simultaneously increasing the importance of short textual content such as microblogs. Microblogs are affected by many factors including the news media, therefore, we exploit sentiments conveyed from news to detect and classify sentiment in microblogs. Given that texts can deal with the same entity but might not be vastly related when it comes to sentiment, it becomes necessary to introduce further measures ensuring the relatedness of texts while leveraging the contained sentiments. This paper describes ongoing research introducing distributional semantics to improve the exploitation of news-contained sentiment to enhance microblog sentiment classification.

pdf bib
Aspect Based Sentiment Analysis into the Wild
Caroline Brun | Vassilina Nikoulina

In this paper, we test state-of-the-art Aspect Based Sentiment Analysis (ABSA) systems trained on a widely used dataset on actual data. We created a new manually annotated dataset of user generated data from the same domain as the training dataset, but from other sources and analyse the differences between the new and the standard ABSA dataset. We then analyse the results in performance of different versions of the same system on both datasets. We also propose light adaptation methods to increase system robustness.

pdf bib
Self-Attention : A Better Building Block for Sentiment Analysis Neural Network Classifiers
Artaches Ambartsoumian | Fred Popowich

Sentiment Analysis has seen much progress in the past two decades. For the past few years, neural network approaches, primarily RNNs and CNNs, have been the most successful for this task. Recently, a new category of neural networks, self-attention networks (SANs), have been created which utilizes the attention mechanism as the basic building block. Self-attention networks have been shown to be effective for sequence modeling tasks, while having no recurrence or convolutions. In this work we explore the effectiveness of the SANs for sentiment analysis. We demonstrate that SANs are superior in performance to their RNN and CNN counterparts by comparing their classification accuracy on six datasets as well as their model characteristics such as training speed and memory consumption. Finally, we explore the effects of various SAN modifications such as multi-head attention as well as two methods of incorporating sequence position information into SANs.

pdf bib
Dual Memory Network Model for Biased Product Review Classification
Yunfei Long | Mingyu Ma | Qin Lu | Rong Xiang | Chu-Ren Huang

In sentiment analysis (SA) of product reviews, both user and product information are proven to be useful. Current tasks handle user profile and product information in a unified model which may not be able to learn salient features of users and products effectively. In this work, we propose a dual user and product memory network (DUPMN) model to learn user profiles and product reviews using separate memory networks. Then, the two representations are used jointly for sentiment prediction. The use of separate models aims to capture user profiles and product information more effectively. Compared to state-of-the-art unified prediction models, the evaluations on three benchmark datasets, IMDB, Yelp13, and Yelp14, show that our dual learning model gives performance gain of 0.6 %, 1.2 %, and 0.9 %, respectively. The improvements are also deemed very significant measured by p-values.p-values.

pdf bib
Measuring Issue Ownership using Word Embeddings
Amaru Cuba Gyllensten | Magnus Sahlgren

Sentiment and topic analysis are common methods used for social media monitoring. Essentially, these methods answers questions such as, what is being talked about, regarding X, and what do people feel, regarding X. In this paper, we investigate another venue for social media monitoring, namely issue ownership and agenda setting, which are concepts from political science that have been used to explain voter choice and electoral outcomes. We argue that issue alignment and agenda setting can be seen as a kind of semantic source similarity of the kind how similar is source A to issue owner P, when talking about issue X, and as such can be measured using word / document embedding techniques. We present work in progress towards measuring that kind of conditioned similarity, and introduce a new notion of similarity for predictive embeddings. We then test this method by measuring the similarity between politically aligned media and political parties, conditioned on bloc-specific issues.X”, and “what do people feel, regarding X”. In this paper, we investigate another venue for social media monitoring, namely issue ownership and agenda setting, which are concepts from political science that have been used to explain voter choice and electoral outcomes. We argue that issue alignment and agenda setting can be seen as a kind of semantic source similarity of the kind “how similar is source A to issue owner P, when talking about issue X”, and as such can be measured using word/document embedding techniques. We present work in progress towards measuring that kind of conditioned similarity, and introduce a new notion of similarity for predictive embeddings. We then test this method by measuring the similarity between politically aligned media and political parties, conditioned on bloc-specific issues.

pdf bib
Sentiment Expression Boundaries in Sentiment Polarity Classification
Rasoul Kaljahi | Jennifer Foster

We investigate the effect of using sentiment expression boundaries in predicting sentiment polarity in aspect-level sentiment analysis. We manually annotate a freely available English sentiment polarity dataset with these boundaries and carry out a series of experiments which demonstrate that high quality sentiment expressions can boost the performance of polarity classification. Our experiments with neural architectures also show that CNN networks outperform LSTMs on this task and dataset.

pdf bib
Exploring and Learning Suicidal Ideation Connotations on Social Media with Deep Learning
Ramit Sawhney | Prachi Manchanda | Puneet Mathur | Rajiv Shah | Raj Singh

The increasing suicide rates amongst youth and its high correlation with suicidal ideation expression on social media warrants a deeper investigation into models for the detection of suicidal intent in text such as tweets to enable prevention. However, the complexity of the natural language constructs makes this task very challenging. Deep Learning architectures such as LSTMs, CNNs, and RNNs show promise in sentence level classification problems. This work investigates the ability of deep learning architectures to build an accurate and robust model for suicidal ideation detection and compares their performance with standard baselines in text classification problems. The experimental results reveal the merit in C-LSTM based models as compared to other deep learning and machine learning based classification models for suicidal ideation detection.

pdf bib
NLP at IEST 2018 : BiLSTM-Attention and LSTM-Attention via Soft Voting in Emotion ClassificationNLP at IEST 2018: BiLSTM-Attention and LSTM-Attention via Soft Voting in Emotion Classification
Qimin Zhou | Hao Wu

This paper describes our method that competed at WASSA2018 Implicit Emotion Shared Task. The goal of this task is to classify the emotions of excluded words in tweets into six different classes : sad, joy, disgust, surprise, anger and fear. For this, we examine a BiLSTM architecture with attention mechanism (BiLSTM-Attention) and a LSTM architecture with attention mechanism (LSTM-Attention), and try different dropout rates based on these two models. We then exploit an ensemble of these methods to give the final prediction which improves the model performance significantly compared with the baseline model. The proposed method achieves 7th position out of 30 teams and outperforms the baseline method by 12.5 % in terms of macro F1.Implicit Emotion Shared Task. The goal of this task is to classify the emotions of excluded words in tweets into six different classes: sad, joy, disgust, surprise, anger and fear. For this, we examine a BiLSTM architecture with attention mechanism (BiLSTM-Attention) and a LSTM architecture with attention mechanism (LSTM-Attention), and try different dropout rates based on these two models. We then exploit an ensemble of these methods to give the final prediction which improves the model performance significantly compared with the baseline model. The proposed method achieves 7th position out of 30 teams and outperforms the baseline method by 12.5% in terms of macro F1.

pdf bib
SINAI at IEST 2018 : Neural Encoding of Emotional External Knowledge for Emotion ClassificationSINAI at IEST 2018: Neural Encoding of Emotional External Knowledge for Emotion Classification
Flor Miriam Plaza-del-Arco | Eugenio Martínez-Cámara | Maite Martin | L. Alfonso Ureña- López

In this paper, we describe our participation in WASSA 2018 Implicit Emotion Shared Task (IEST 2018). We claim that the use of emotional external knowledge may enhance the performance and the capacity of generalization of an emotion classification system based on neural networks. Accordingly, we submitted four deep learning systems grounded in a sequence encoding layer. They mainly differ in the feature vector space and the recurrent neural network used in the sequence encoding layer. The official results show that the systems that used emotional external knowledge have a higher capacity of generalization, hence our claim holds.

pdf bib
EmoNLP at IEST 2018 : An Ensemble of Deep Learning Models and Gradient Boosting Regression Tree for Implicit Emotion Prediction in TweetsEmoNLP at IEST 2018: An Ensemble of Deep Learning Models and Gradient Boosting Regression Tree for Implicit Emotion Prediction in Tweets
Man Liu

This paper describes our system submitted to IEST 2018, a shared task (Klinger et al., 2018) to predict the emotion types. Six emotion types are involved : anger, joy, fear, surprise, disgust and sad. We perform three different approaches : feed forward neural network (FFNN), convolutional BLSTM (ConBLSTM) and Gradient Boosting Regression Tree Method (GBM). Word embeddings used in convolutional BLSTM are pre-trained on 470 million tweets which are filtered using the emotional words and emojis. In addition, broad sets of features (i.e. syntactic features, lexicon features, cluster features) are adopted to train GBM and FFNN. The three approaches are finally ensembled by the weighted average of predicted probabilities of each emotion label.

pdf bib
HGSGNLP at IEST 2018 : An Ensemble of Machine Learning and Deep Neural Architectures for Implicit Emotion Classification in TweetsHGSGNLP at IEST 2018: An Ensemble of Machine Learning and Deep Neural Architectures for Implicit Emotion Classification in Tweets
Wenting Wang

This paper describes our system designed for the WASSA-2018 Implicit Emotion Shared Task (IEST). The task is to predict the emotion category expressed in a tweet by removing the terms angry, afraid, happy, sad, surprised, disgusted and their synonyms. Our final submission is an ensemble of one supervised learning model and three deep neural network based models, where each model approaches the problem from essentially different directions. Our system achieves the macro F1 score of 65.8 %, which is a 5.9 % performance improvement over the baseline and is ranked 12 out of 30 participating teams.angry, afraid, happy, sad, surprised, disgusted and their synonyms. Our final submission is an ensemble of one supervised learning model and three deep neural network based models, where each model approaches the problem from essentially different directions. Our system achieves the macro F1 score of 65.8%, which is a 5.9% performance improvement over the baseline and is ranked 12 out of 30 participating teams.

pdf bib
UWB at IEST 2018 : Emotion Prediction in Tweets with Bidirectional Long Short-Term Memory Neural NetworkUWB at IEST 2018: Emotion Prediction in Tweets with Bidirectional Long Short-Term Memory Neural Network
Pavel Přibáň | Jiří Martínek

This paper describes our system created for the WASSA 2018 Implicit Emotion Shared Task. The goal of this task is to predict the emotion of a given tweet, from which a certain emotion word is removed. The removed word can be sad, happy, disgusted, angry, afraid or a synonym of one of them. Our proposed system is based on deep-learning methods. We use Bidirectional Long Short-Term Memory (BiLSTM) with word embeddings as an input. Pre-trained DeepMoji model and pre-trained emoji2vec emoji embeddings are also used as additional inputs. Our System achieves 0.657 macro F1 score and our rank is 13th out of 30.sad, happy, disgusted, angry, afraid or a synonym of one of them. Our proposed system is based on deep-learning methods. We use Bidirectional Long Short-Term Memory (BiLSTM) with word embeddings as an input. Pre-trained DeepMoji model and pre-trained emoji2vec emoji embeddings are also used as additional inputs. Our System achieves 0.657 macro F1 score and our rank is 13th out of 30.

pdf bib
EmotiKLUE at IEST 2018 : Topic-Informed Classification of Implicit EmotionsEmotiKLUE at IEST 2018: Topic-Informed Classification of Implicit Emotions
Thomas Proisl | Philipp Heinrich | Besim Kabashi | Stefan Evert

EmotiKLUE is a submission to the Implicit Emotion Shared Task. It is a deep learning system that combines independent representations of the left and right contexts of the emotion word with the topic distribution of an LDA topic model. EmotiKLUE achieves a macro average Fscore of 67.13 %, significantly outperforming the baseline produced by a simple ML classifier. Further enhancements after the evaluation period lead to an improved Fscore of 68.10 %.F₁score of 67.13%, significantly outperforming the baseline produced by a simple ML classifier. Further enhancements after the evaluation period lead to an improved F₁score of 68.10%.

pdf bib
Disney at IEST 2018 : Predicting Emotions using an EnsembleIEST 2018: Predicting Emotions using an Ensemble
Wojciech Witon | Pierre Colombo | Ashutosh Modi | Mubbasir Kapadia

This paper describes our participating system in the WASSA 2018 shared task on emotion prediction. The task focuses on implicit emotion prediction in a tweet. In this task, keywords corresponding to the six emotion labels used (anger, fear, disgust, joy, sad, and surprise) have been removed from the tweet text, making emotion prediction implicit and the task challenging. We propose a model based on an ensemble of classifiers for prediction. Each classifier uses a sequence of Convolutional Neural Network (CNN) architecture blocks and uses ELMo (Embeddings from Language Model) as an input. Our system achieves a 66.2 % F1 score on the test set. The best performing system in the shared task has reported a 71.4 % F1 score.

pdf bib
Fast Approach to Build an Automatic Sentiment Annotator for Legal Domain using Transfer Learning
Viraj Salaka | Menuka Warushavithana | Nisansa de Silva | Amal Shehan Perera | Gathika Ratnayaka | Thejan Rupasinghe

This study proposes a novel way of identifying the sentiment of the phrases used in the legal domain. The added complexity of the language used in law, and the inability of the existing systems to accurately predict the sentiments of words in law are the main motivations behind this study. This is a transfer learning approach which can be used for other domain adaptation tasks as well. The proposed methodology achieves an improvement of over 6 % compared to the source model’s accuracy in the legal domain.

pdf bib
What Makes You Stressed? Finding Reasons From Tweets
Reshmi Gopalakrishna Pillai | Mike Thelwall | Constantin Orasan

Detecting stress from social media gives a non-intrusive and inexpensive alternative to traditional tools such as questionnaires or physiological sensors for monitoring mental state of individuals. This paper introduces a novel framework for finding reasons for stress from tweets, analyzing multiple categories for the first time. Three word-vector based methods are evaluated on collections of tweets about politics or airlines and are found to be more accurate than standard machine learning algorithms.

pdf bib
EmojiGAN : learning emojis distributions with a generative modelEmojiGAN: learning emojis distributions with a generative model
Bogdan Mazoure | Thang Doan | Saibal Ray

Generative models have recently experienced a surge in popularity due to the development of more efficient training algorithms and increasing computational power. Models such as adversarial generative networks (GANs) have been successfully used in various areas such as computer vision, medical imaging, style transfer and natural language generation. Adversarial nets were recently shown to yield results in the image-to-text task, where given a set of images, one has to provide their corresponding text description. In this paper, we take a similar approach and propose a image-to-emoji architecture, which is trained on data from social networks and can be used to score a given picture using ideograms. We show empirical results of our algorithm on data obtained from the most influential Instagram accounts.

pdf bib
Homonym Detection For Humor Recognition In Short Text
Sven van den Beukel | Lora Aroyo

In this paper, automatic homophone- and homograph detection are suggested as new useful features for humor recognition systems. The system combines style-features from previous studies on humor recognition in short text with ambiguity-based features. The performance of two potentially useful homograph detection methods is evaluated using crowdsourced annotations as ground truth. Adding homophones and homographs as features to the classifier results in a small but significant improvement over the style-features alone. For the task of humor recognition, recall appears to be a more important quality measure than precision. Although the system was designed for humor recognition in oneliners, it also performs well at the classification of longer humorous texts.

pdf bib
Emo2Vec : Learning Generalized Emotion Representation by Multi-task TrainingEmo2Vec: Learning Generalized Emotion Representation by Multi-task Training
Peng Xu | Andrea Madotto | Chien-Sheng Wu | Ji Ho Park | Pascale Fung

In this paper, we propose Emo2Vec which encodes emotional semantics into vectors. We train Emo2Vec by multi-task learning six different emotion-related tasks, including emotion / sentiment analysis, sarcasm classification, stress detection, abusive language classification, insult detection, and personality recognition. Our evaluation of Emo2Vec shows that it outperforms existing affect-related representations, such as Sentiment-Specific Word Embedding and DeepMoji embeddings with much smaller training corpora. When concatenated with GloVe, Emo2Vec achieves competitive performances to state-of-the-art results on several tasks using a simple logistic regression classifier.

pdf bib
Super Characters : A Conversion from Sentiment Classification to Image Classification
Baohua Sun | Lin Yang | Patrick Dong | Wenhan Zhang | Jason Dong | Charles Young

We propose a method named Super Characters for sentiment classification. This method converts the sentiment classification problem into image classification problem by projecting texts into images and then applying CNN models for classification. Text features are extracted automatically from the generated Super Characters images, hence there is no need of any explicit step of embedding the words or characters into numerical vector representations. Experimental results on large social media corpus show that the Super Characters method consistently outperforms other methods for sentiment classification and topic classification tasks on ten large social media datasets of millions of contents in four different languages, including Chinese, Japanese, Korean and English.

pdf bib
Learning Comment Controversy Prediction in Web Discussions Using Incidentally Supervised Multi-Task CNNsCNNs
Nils Rethmeier | Marc Hübner | Leonhard Hennig

Comments on web news contain controversies that manifest as inter-group agreement-conflicts. Tracking such rapidly evolving controversy could ease conflict resolution or journalist-user interaction. However, this presupposes controversy online-prediction that scales to diverse domains using incidental supervision signals instead of manual labeling. To more deeply interpret comment-controversy model decisions we frame prediction as binary classification and evaluate baselines and multi-task CNNs that use an auxiliary news-genre-encoder. Finally, we use ablation and interpretability methods to determine the impacts of topic, discourse and sentiment indicators, contextual vs. global word influence, as well as genre-keywords vs. per-genre-controversy keywords to find that the models learn plausible controversy features using only incidentally supervised signals.rapidly evolving controversy could ease conflict resolution or journalist-user interaction. However, this presupposes controversy online-prediction that scales to diverse domains using incidental supervision signals instead of manual labeling. To more deeply interpret comment-controversy model decisions we frame prediction as binary classification and evaluate baselines and multi-task CNNs that use an auxiliary news-genre-encoder. Finally, we use ablation and interpretability methods to determine the impacts of topic, discourse and sentiment indicators, contextual vs. global word influence, as well as genre-keywords vs. per-genre-controversy keywords – to find that the models learn plausible controversy features using only incidentally supervised signals.

pdf bib
Predicting Adolescents’ Educational Track from Chat Messages on Dutch Social MediaDutch Social Media
Lisa Hilte | Walter Daelemans | Reinhild Vandekerckhove

We aim to predict Flemish adolescents’ educational track based on their Dutch social media writing. We distinguish between the three main types of Belgian secondary education : General (theory-oriented), Vocational (practice-oriented), and Technical Secondary Education (hybrid). The best results are obtained with a Naive Bayes model, i.e. an F-score of 0.68 (std. dev. 0.05) in 10-fold cross-validation experiments on the training data and an F-score of 0.60 on unseen data. Many of the most informative features are character n-grams containing specific occurrences of chatspeak phenomena such as emoticons. While the detection of the most theory- and practice-oriented educational tracks seems to be a relatively easy task, the hybrid Technical level appears to be much harder to capture based on online writing style, as expected.

pdf bib
Arabizi sentiment analysis based on transliteration and automatic corpus annotationArabizi sentiment analysis based on transliteration and automatic corpus annotation
Imane Guellil | Ahsan Adeel | Faical Azouaou | Fodil Benali | Ala-eddine Hachani | Amir Hussain

Arabizi is a form of writing Arabic text which relies on Latin letters, numerals and punctuation rather than Arabic letters. In the literature, the difficulties associated with Arabizi sentiment analysis have been underestimated, principally due to the complexity of Arabizi. In this paper, we present an approach to automatically classify sentiments of Arabizi messages into positives or negatives. In the proposed approach, Arabizi messages are first transliterated into Arabic. Afterwards, we automatically classify the sentiment of the transliterated corpus using an automatically annotated corpus. For corpus validation, shallow machine learning algorithms such as Support Vectors Machine (SVM) and Naive Bays (NB) are used. Simulations results demonstrate the outperformance of NB algorithm over all others. The highest achieved F1-score is up to 78 % and 76 % for manually and automatically transliterated dataset respectively. Ongoing work is aimed at improving the transliterator module and annotated sentiment dataset.