Journées d'Etudes sur la Parole / Traitement Automatique de la Langue Naturelle / Rencontres des Etudiants Chercheurs en Informatique et Traitement Automatique des Langues (2018)


up

bib (full) Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

pdf bib
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN
Pascale Sébillot | Vincent Claveau

pdf bib
Modeling infant segmentation of two morphologically diverse languages
Georgia-Rengina Loukatou | Sabine Stoll | Damian Blasi | Alejandrina Cristia

A rich literature explores unsupervised segmentation algorithms infants could use to parse their input, mainly focusing on English, an analytic language where word, morpheme, and syllable boundaries often coincide. Synthetic languages, where words are multi-morphemic, may present unique difficulties for segmentation. Our study tests corpora of two languages selected to differ in the extent of complexity of their morphological structure, Chintang and Japanese. We use three conceptually diverse word segmentation algorithms and we evaluate them on both word- and morpheme-level representations. As predicted, results for the simpler Japanese are better than those for the more complex Chintang. However, the difference is small compared to the effect of the algorithm (with the lexical algorithm outperforming sub-lexical ones) and the level (scores were lower when evaluating on words versus morphemes). There are also important interactions between language, model, and evaluation level, which ought to be considered in future work.

up

bib (full) Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

pdf bib
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN
Pascale Sébillot | Vincent Claveau

pdf bib
Predicting the Semantic Textual Similarity with Siamese CNN and LSTMSiamese CNN and LSTM
Elvys Linhares Pontes | Stéphane Huet | Andréa Carneiro Linhares | Juan-Manuel Torres-Moreno

Semantic Textual Similarity (STS) is the basis of many applications in Natural Language Processing (NLP). Our system combines convolution and recurrent neural networks to measure the semantic similarity of sentences. It uses a convolution network to take account of the local context of words and an LSTM to consider the global context of sentences. This combination of networks helps to preserve the relevant information of sentences and improves the calculation of the similarity between sentences. Our model has achieved good results and is competitive with the best state-of-the-art systems.

pdf bib
Predicting failure of a mediated conversation in the context of asymetric role dialogues
Romain Carbou | Delphine Charlet | Géraldine Damnati | Frédéric Landragin | Jean Léon Bouraoui

In a human-to-human conversation between a user and his interlocutor in an assistance center, we suppose a context where the conclusion of the dialog can characterize a notion of success or failure, explicitly annotated or deduced. The study involves different approaches expected to have an influence on predictive classification model of failures. On the one hand, we will aim at taking into account the asymmetry of the speakers’ roles in the modelling of the lexical distribution. On the other hand, we will determine whether the part of the lexicon most closely relating to the domain of customer assistance studied here, modifies the quality of the prediction. We will eventually assess the perspectives of generalization to morphologically comparable corpora.

pdf bib
A comparative study of word embeddings and other features for lexical complexity detection in FrenchFrench
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen

Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this language.

pdf bib
JeuxDeLiens : Word Embeddings and Path-Based Similarity for Entity Linking using the French JeuxDeMots Lexical Semantic NetworkJeuxDeLiens: Word Embeddings and Path-Based Similarity for Entity Linking using the French JeuxDeMots Lexical Semantic Network
Julien Plu | Kevin Cousot | Mathieu Lafourcade | Raphaël Troncy | Giuseppe Rizzo

Entity linking systems typically rely on encyclopedic knowledge bases such as DBpedia or Freebase. In this paper, we use, instead, a French lexical-semantic network named JeuxDeMots to jointly type and link entities. Our approach combines word embeddings and a path-based similarity resulting in encouraging results over a set of documents from the French Le Monde newspaper.

up

bib (full) Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

pdf bib
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT
Pascale Sébillot | Vincent Claveau

pdf bib
Analysis of Inferences in Chinese for Opinion MiningChinese for Opinion Mining
Liyun Yan

Analysis of Inferences in Chinese for Opinion Mining Opinion mining is an essential activity for economic watch, made easier by social networks and ad hoc forums. The analysis generally relies on lexicon of sentiments. Nevertheless, some opinions are expressed through inferences. In this paper, we propose a classification of inferences used in Chinese in tourist comments, for an opinion mining task, based on three levels of analysis (semantic realization, modality of realization and production mode). We proved the interest to analyze the distinct types of inferences to identify the polarity of opinions expressed in corpora. We also present some results based on word embeddings.

pdf bib
Automatic image annotation : the case of deforestation
Duy Huynh | Nathalie Neptune

Automatic image annotation : the case of deforestation. This paper aims to present the state of the art of the methods that are used for automatic annotation of earth observation image for deforestation detection. We are interested in the various challenges that the field covers and we present the state of the art methods and the future research that we are considering.


up

bib (full) Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

pdf bib
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT
Pascale Sébillot | Vincent Claveau

pdf bib
Adapted Sentiment Similarity Seed Words For French Tweets’ Polarity ClassificationFrench Tweets’ Polarity Classification
Amal Htait

We present, in this paper, our contribution in DEFT 2018 task 2 : Global polarity, determining the overall polarity (Positive, Negative, Neutral or MixPosNeg) of tweets regarding public transport, in French language. Our system is based on a list of sentiment seed-words adapted for French public transport tweets. These seed-words are extracted from DEFT’s training annotated dataset, and the sentiment relations between seed-words and other terms are captured by cosine measure of their word embeddings representations, using a French language word embeddings model of 683k words. Our semi-supervised system achieved an F1-measure equals to 0.64.