Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

Hannu Toivonen, Michele Boggia (Editors)

Anthology ID:
EACL | Hackashop
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Hannu Toivonen | Michele Boggia

pdf bib
Adversarial Training for News Stance Detection : Leveraging Signals from a Multi-Genre Corpus.
Costanza Conforti | Jakob Berndt | Marco Basaldella | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier

Cross-target generalization constitutes an important issue for news Stance Detection (SD). In this short paper, we investigate adversarial cross-genre SD, where knowledge from annotated user-generated data is leveraged to improve news SD on targets unseen during training. We implement a BERT-based adversarial network and show experimental performance improvements over a set of strong baselines. Given the abundance of user-generated data, which are considerably less expensive to retrieve and annotate than news articles, this constitutes a promising research direction.

pdf bib
Aligning Estonian and Russian news industry keywords with the help of subtitle translations and an environmental thesaurusEstonian and Russian news industry keywords with the help of subtitle translations and an environmental thesaurus
Andraž Repar | Andrej Shumakov

This paper presents the implementation of a bilingual term alignment approach developed by Repar et al. (2019) to a dataset of unaligned Estonian and Russian keywords which were manually assigned by journalists to describe the article topic. We started by separating the dataset into Estonian and Russian tags based on whether they are written in the Latin or Cyrillic script. Then we selected the available language-specific resources necessary for the alignment system to work. Despite the domains of the language-specific resources (subtitles and environment) not matching the domain of the dataset (news articles), we were able to achieve respectable results with manual evaluation indicating that almost 3/4 of the aligned keyword pairs are at least partial matches.

pdf bib
Comment Section Personalization : Algorithmic, Interface, and Interaction Design
Yixue Wang

Comment sections allow users to share their personal experiences, discuss and form different opinions, and build communities out of organic conversations. However, many comment sections present chronological ranking to all users. In this paper, I discuss personalization approaches in comment sections based on different objectives for newsrooms and researchers to consider. I propose algorithmic and interface designs when personalizing the presentation of comments based on different objectives including relevance, diversity, and education / background information. I further explain how transparency, user control, and comment type diversity could help users most benefit from the personalized interacting experience.

pdf bib
EMBEDDIA Tools, Datasets and Challenges : Resources and Hackathon ContributionsEMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions
Senja Pollak | Marko Robnik-Šikonja | Matthew Purver | Michele Boggia | Ravi Shekhar | Marko Pranjić | Salla Salmela | Ivar Krustok | Tarmo Paju | Carl-Gustav Linden | Leo Leppänen | Elaine Zosa | Matej Ulčar | Linda Freienthal | Silver Traat | Luis Adrián Cabrera-Diego | Matej Martinc | Nada Lavrač | Blaž Škrlj | Martin Žnidaršič | Andraž Pelicon | Boshko Koloski | Vid Podpečan | Janez Kranjc | Shane Sheehan | Emanuela Boros | Jose G. Moreno | Antoine Doucet | Hannu Toivonen

This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program. The collected resources were offered to participants of a hackathon organized as part of the EACL Hackashop on News Media Content Analysis and Automated Report Generation in February 2021. The hackathon had six participating teams who addressed different challenges, either from the list of proposed challenges or their own news-industry-related tasks. This paper goes beyond the scope of the hackathon, as it brings together in a coherent and compact form most of the resources developed, collected and released by the EMBEDDIA project. Moreover, it constitutes a handy source for news media industry and researchers in the fields of Natural Language Processing and Social Science.

pdf bib
A COVID-19 news coverage mood map of EuropeCOVID-19 news coverage mood map of Europe
Frankie Robertson | Jarkko Lagus | Kaisla Kajava

We present a COVID-19 news dashboard which visualizes sentiment in pandemic news coverage in different languages across Europe. The dashboard shows analyses for positive / neutral / negative sentiment and moral sentiment for news articles across countries and languages. First we extract news articles from news-crawl. Then we use a pre-trained multilingual BERT model for sentiment analysis of news article headlines and a dictionary and word vectors -based method for moral sentiment analysis of news articles. The resulting dashboard gives a unified overview of news events on COVID-19 news overall sentiment, and the region and language of publication from the period starting from the beginning of January 2020 to the end of January 2021.

pdf bib
Interesting cross-border news discovery using cross-lingual article linking and document similarity
Boshko Koloski | Elaine Zosa | Timen Stepišnik-Perdih | Blaž Škrlj | Tarmo Paju | Senja Pollak

Team Name : team-8 Embeddia Tool : Cross-Lingual Document Retrieval Zosa et al. Dataset : Estonian and Latvian news datasets abstract : Contemporary news media face increasing amounts of available data that can be of use when prioritizing, selecting and discovering new news. In this work we propose a methodology for retrieving interesting articles in a cross-border news discovery setting. More specifically, we explore how a set of seed documents in Estonian can be projected in Latvian document space and serve as a basis for discovery of novel interesting pieces of Latvian news that would interest Estonian readers. The proposed methodology was evaluated by Estonian journalist who confirmed that in the best setting, from top 10 retrieved Latvian documents, half of them represent news that are potentially interesting to be taken by the Estonian media house and presented to Estonian readers.