Journées d'Etudes sur la Parole / Traitement Automatique de la Langue Naturelle / Rencontres des Etudiants Chercheurs en Informatique et Traitement Automatique des Langues (2019)



up

bib (full) Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

pdf bib
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts
Emmanuel Morin | Sophie Rosset | Pierre Zweigenbaum

pdf bib
Multilingual and Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Marlène Coulomb-Gully | Abhishek Kumar

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection from a multilingual perspective. We focus in particular on hateful messages towards two different targets (immigrants and women) in English tweets, as well as sexist messages in both English and French. Several models have been developed ranging from feature-engineering approaches to neural ones. Our experiments show very encouraging results on both languages.

up

bib (full) Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume III : RECITAL

pdf bib
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume III : RECITAL
Emmanuel Morin | Sophie Rosset | Pierre Zweigenbaum

pdf bib
Automatic summarization of medical conversations, a review
Jessica López Espejel

ion et pour l’analyse du dialogue. Nous dcrivons aussi les utilisation du Traitement Automatique des Langues dans le domaine mdical. A BSTRACT Conversational analysis plays an important role in the development of simulation devices for the training of health professionals (doctors, nurses). Our goal is to develop an original automatic synthesis method for medical conversations between a patient and a healthcare professional, based on recent advances in summarization using convolutional and recurrent neural networks. The proposed method must be adapted to the specific problems related to the synthesis of dialogues. This article presents a review of the different methods for extractive and abstractive summarization, and for dialogue analysis. We also describe the use of Natural Language Processing in the medical field.



up

bib (full) Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Terminologie et Intelligence Artificielle (atelier TALN-RECITAL \& IC)

pdf bib
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Terminologie et Intelligence Artificielle (atelier TALN-RECITAL \& IC)
Emmanuel Morin | Sophie Rosset | Pierre Zweigenbaum

pdf bib
Terminology systematization for Cybersecurity domain in Italian LanguageItalian Language
Claudia Lanza | Béatrice Daille

This paper aims at presenting the first steps to improve the quality of the first draft of an Italian thesaurus for Cybersecurity terminology that has been realized for a specific project activity in collaboration with CybersecurityLab at Informatics and Telematics Institute (IIT) of the National Council of Research (CNR) in Italy. In particular, the paper will focus, first, on the terminological knowledge base built to retrieve the most representative candidate terms of Cybersecurity domain in Italian language, giving examples of the main gold standard repositories that have been used to build this semantic tool. Attention will be then given to the methodology and software employed to configure a system of NLP rules to get the desired semantic results and to proceed with the enhancement of the candidate terms selection which are meant to be inserted in the controlled vocabulary.

pdf bib
Terminology-based Text Embedding for Computing Document Similarities on Technical Content
Hamid Mirisaee | Eric Gaussier | Cedric Lagnier | Agnes Guerraz

We propose in this paper a new, hybrid document embedding approach in order to address the problem of document similarities with respect to the technical content. To do so, we employ a state-of-the-art graph techniques to first extract the keyphrases (composite keywords) of documents and, then, use them to score the sentences. Using the ranked sentences, we propose two approaches to embed documents and show their performances with respect to two baselines. With domain expert annotations, we illustrate that the proposed methods can find more relevant documents and outperform the baselines up to 27 % in terms of NDCG.

pdf bib
Entropic characterisation of termino-conceptual structure : A preliminary study
Kyo Kageura | Long-Huei Chen

Terms represent concepts, which consist of conceptual characteristics. In actual concept-term formation, which is done by researchers, the process is in reverse : conceptual elements / characteristics are consolidated to form concepts, which are represented by terms. As concepts do not exist on the fly, what we may call termino-conceptual system provides scaffolding in this process. Terminologists, both in practice and in research, do not only collect and list terms but also analyse, describe and define terms and systematise terminologies. To carry out these tasks, terminologists must refer to conceptual systems, to the extent that they contribute to systematising terminologies ; terminologists thus also deal with the sphere of termino-conceptual system. In this paper, we consolidate the status of termino-conceptual sphere and propose a way to characterise the structure of termino-conceptual system by using entropy. The entropic characterisation of English terminologies of six domain, i.e. agriculture, botany, chemistry, computer science, physics and psychology are presented.