Orphee De Clercq

Also published as: Orphée De Clercq


2021

pdf bib
Event Prominence Extraction Combining a Knowledge-Based Syntactic Parser and a BERT Classifier for DutchBERT Classifier for Dutch
Thierry Desot | Orphee De Clercq | Veronique Hoste
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

A core task in information extraction is event detection that identifies event triggers in sentences that are typically classified into event types. In this study an event is considered as the unit to measure diversity and similarity in news articles in the framework of a news recommendation system. Current typology-based event detection approaches fail to handle the variety of events expressed in real-world situations. To overcome this, we aim to perform event salience classification and explore whether a transformer model is capable of classifying new information into less and more general prominence classes. After comparing a Support Vector Machine (SVM) baseline and our transformer-based classifier performances on several event span formats, we conceived multi-word event spans as syntactic clauses. Those are fed into our prominence classifier which is fine-tuned on pre-trained Dutch BERT word embeddings. On top of that we outperform a pipeline of a Conditional Random Field (CRF) approach to event-trigger word detection and the BERT-based classifier. To the best of our knowledge we present the first event extraction approach that combines an expert-based syntactic parser with a transformer-based classifier for Dutch.

pdf bib
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Orphee De Clercq | Alexandra Balahur | Joao Sedoc | Valentin Barriere | Shabnam Tafreshi | Sven Buechel | Veronique Hoste
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Emotional RobBERT and Insensitive BERTje : Combining Transformers and Affect Lexica for Dutch Emotion DetectionRobBERT and Insensitive BERTje: Combining Transformers and Affect Lexica for Dutch Emotion Detection
Luna De Bruyne | Orphee De Clercq | Veronique Hoste
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In a first step towards improving Dutch emotion detection, we try to combine the Dutch transformer models BERTje and RobBERT with lexicon-based methods. We propose two architectures : one in which lexicon information is directly injected into the transformer model and a meta-learning approach where predictions from transformers are combined with lexicon features. The models are tested on 1,000 Dutch tweets and 1,000 captions from TV-shows which have been manually annotated with emotion categories and dimensions. We find that RobBERT clearly outperforms BERTje, but that directly adding lexicon information to transformers does not improve performance. In the meta-learning approach, lexicon information does have a positive effect on BERTje, but not on RobBERT. This suggests that more emotional information is already contained within this latter language model.

2020

pdf bib
It’s absolutely divine ! Can fine-grained sentiment analysis benefit from coreference resolution?
Orphee De Clercq | Veronique Hoste
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

While it has been claimed that anaphora or coreference resolution plays an important role in opinion mining, it is not clear to what extent coreference resolution actually boosts performance, if at all. In this paper, we investigate the potential added value of coreference resolution for the aspect-based sentiment analysis of restaurant reviews in two languages, English and Dutch. We focus on the task of aspect category classification and investigate whether including coreference information prior to classification to resolve implicit aspect mentions is beneficial. Because coreference resolution is not a solved task in NLP, we rely on both automatically-derived and gold-standard coreference relations, allowing us to investigate the true upper bound. By training a classifier on a combination of lexical and semantic features, we show that resolving the coreferential relations prior to classification is beneficial in a joint optimization setup. However, this is only the case when relying on gold-standard relations and the result is more outspoken for English than for Dutch. When validating the optimal models, however, we found that only the Dutch pipeline is able to achieve a satisfying performance on a held-out test set and does so regardless of whether coreference information was included.

pdf bib
An Emotional Mess ! Deciding on a Framework for Building a Dutch Emotion-Annotated CorpusDutch Emotion-Annotated Corpus
Luna De Bruyne | Orphee De Clercq | Veronique Hoste
Proceedings of the 12th Language Resources and Evaluation Conference

Seeing the myriad of existing emotion models, with the categorical versus dimensional opposition the most important dividing line, building an emotion-annotated corpus requires some well thought-out strategies concerning framework choice. In our work on automatic emotion detection in Dutch texts, we investigate this problem by means of two case studies. We find that the labels joy, love, anger, sadness and fear are well-suited to annotate texts coming from various domains and topics, but that the connotation of the labels strongly depends on the origin of the texts. Moreover, it seems that information is lost when an emotional state is forcedly classified in a limited set of categories, indicating that a bi-representational format is desirable when creating an emotion corpus.

2019

pdf bib
Benefits of Data Augmentation for NMT-based Text Normalization of User-Generated ContentNMT-based Text Normalization of User-Generated Content
Claudia Matos Veliz | Orphee De Clercq | Veronique Hoste
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

One of the most persistent characteristics of written user-generated content (UGC) is the use of non-standard words. This characteristic contributes to an increased difficulty to automatically process and analyze UGC. Text normalization is the task of transforming lexical variants to their canonical forms and is often used as a pre-processing step for conventional NLP tasks in order to overcome the performance drop that NLP systems experience when applied to UGC. In this work, we follow a Neural Machine Translation approach to text normalization. To train such an encoder-decoder model, large parallel training corpora of sentence pairs are required. However, obtaining large data sets with UGC and their normalized version is not trivial, especially for languages other than English. In this paper, we explore how to overcome this data bottleneck for Dutch, a low-resource language. We start off with a small publicly available parallel Dutch data set comprising three UGC genres and compare two different approaches. The first is to manually normalize and add training data, a money and time-consuming task. The second approach is a set of data augmentation techniques which increase data size by converting existing resources into synthesized non-standard forms. Our results reveal that, while the different approaches yield similar results regarding the normalization issues in the test set, they also introduce a large amount of over-normalizations.

pdf bib
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Roman Klinger | Veronique Hoste | Carlo Strapparava | Orphee De Clercq
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2018

pdf bib
LT3 at SemEval-2018 Task 1 : A classifier chain to detect emotions in tweetsLT3 at SemEval-2018 Task 1: A classifier chain to detect emotions in tweets
Luna De Bruyne | Orphée De Clercq | Véronique Hoste
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper presents an emotion classification system for English tweets, submitted for the SemEval shared task on Affect in Tweets, subtask 5 : Detecting Emotions. The system combines lexicon, n-gram, style, syntactic and semantic features. For this multi-class multi-label problem, we created a classifier chain. This is an ensemble of eleven binary classifiers, one for each possible emotion category, where each model gets the predictions of the preceding models as additional features. The predicted labels are combined to get a multi-label representation of the predictions. Our system was ranked eleventh among thirty five participating teams, with a Jaccard accuracy of 52.0 % and macro- and micro-average F1-scores of 49.3 % and 64.0 %, respectively.

2017

pdf bib
Towards an integrated pipeline for aspect-based sentiment analysis in various domains
Orphée De Clercq | Els Lefever | Gilles Jacobs | Tijl Carpels | Véronique Hoste
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper presents an integrated ABSA pipeline for Dutch that has been developed and tested on qualitative user feedback coming from three domains : retail, banking and human resources. The two latter domains provide service-oriented data, which has not been investigated before in ABSA. By performing in-domain and cross-domain experiments the validity of our approach was investigated. We show promising results for the three ABSA subtasks, aspect term extraction, aspect category classification and aspect polarity classification.