Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)

Rami Aly, Christos Christodoulopoulos, Oana Cocarascu, Zhijiang Guo, Arpit Mittal, Michael Schlichtkrull, James Thorne, Andreas Vlachos (Editors)


Anthology ID:
2021.fever-1
Month:
November
Year:
2021
Address:
Dominican Republic
Venues:
EMNLP | FEVER
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2021.fever-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)
Rami Aly | Christos Christodoulopoulos | Oana Cocarascu | Zhijiang Guo | Arpit Mittal | Michael Schlichtkrull | James Thorne | Andreas Vlachos

pdf bib
The Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS) Shared TaskVERification Over Unstructured and Structured information (FEVEROUS) Shared Task
Rami Aly | Zhijiang Guo | Michael Sejr Schlichtkrull | James Thorne | Andreas Vlachos | Christos Christodoulopoulos | Oana Cocarascu | Arpit Mittal

The Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS) shared task, asks participating systems to determine whether human-authored claims are Supported or Refuted based on evidence retrieved from Wikipedia (or NotEnoughInfo if the claim can not be verified). Compared to the FEVER 2018 shared task, the main challenge is the addition of structured data (tables and lists) as a source of evidence. The claims in the FEVEROUS dataset can be verified using only structured evidence, only unstructured evidence, or a mixture of both. Submissions are evaluated using the FEVEROUS score that combines label accuracy and evidence retrieval. Unlike FEVER 2018, FEVEROUS requires partial evidence to be returned for NotEnoughInfo claims, and the claims are longer and thus more complex. The shared task received 13 entries, six of which were able to beat the baseline system. The winning team was Bust a move !, achieving a FEVEROUS score of 27 % (+9 % compared to the baseline). In this paper we describe the shared task, present the full results and highlight commonalities and innovations among the participating systems.

pdf bib
FaBULOUS : Fact-checking Based on Understanding of Language Over Unstructured and Structured informationFaBULOUS: Fact-checking Based on Understanding of Language Over Unstructured and Structured information
Mostafa Bouziane | Hugo Perrin | Amine Sadeq | Thanh Nguyen | Aurélien Cluzeau | Julien Mardas

As part of the FEVEROUS shared task, we developed a robust and finely tuned architecture to handle the joint retrieval and entailment on text data as well as structured data like tables. We proposed two training schemes to tackle the hurdles inherent to multi-hop multi-modal datasets. The first one allows having a robust retrieval of full evidence sets, while the second one enables entailment to take full advantage of noisy evidence inputs. In addition, our work has revealed important insights and potential avenue of research for future improvement on this kind of dataset. In preliminary evaluation on the FEVEROUS shared task test set, our system achieves 0.271 FEVEROUS score, with 0.4258 evidence recall and 0.5607 entailment accuracy.

pdf bib
Team Papelo at FEVEROUS : Multi-hop Evidence PursuitFEVEROUS: Multi-hop Evidence Pursuit
Christopher Malon

We develop a system for the FEVEROUS fact extraction and verification task that ranks an initial set of potential evidence and then pursues missing evidence in subsequent hops by trying to generate it, with a next hop prediction module whose output is matched against page elements in a predicted article. Seeking evidence with the next hop prediction module continues to improve FEVEROUS score for up to seven hops. Label classification is trained on possibly incomplete extracted evidence chains, utilizing hints that facilitate numerical comparison. The system achieves.281 FEVEROUS score and.658 label accuracy on the development set, and finishes in second place with.259 FEVEROUS score and.576 label accuracy on the test set.

pdf bib
Stance Detection in German News ArticlesGerman News Articles
Laura Mascarell | Tatyana Ruzsics | Christian Schneebeli | Philippe Schlattner | Luca Campanella | Severin Klingler | Cristina Kadar

The widespread use of the Internet and the rapid dissemination of information poses the challenge of identifying the veracity of its content. Stance detection, which is the task of predicting the position of a text in regard to a specific target (e.g. claim or debate question), has been used to determine the veracity of information in tasks such as rumor classification and fake news detection. While most of the work and available datasets for stance detection address short texts snippets extracted from textual dialogues, social media platforms, or news headlines with a strong focus on the English language, there is a lack of resources targeting long texts in other languages. Our contribution in this paper is twofold. First, we present a German dataset of debate questions and news articles that is manually annotated for stance and emotion detection. Second, we leverage the dataset to tackle the supervised task of classifying the stance of a news article with regards to a debate question and provide baseline models as a reference for future work on stance detection in German news articles.

pdf bib
FANG-COVID : A New Large-Scale Benchmark Dataset for Fake News Detection in GermanFANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German
Justus Mattern | Yu Qiao | Elma Kerz | Daniel Wiechmann | Markus Strohmaier

As the world continues to fight the COVID-19 pandemic, it is simultaneously fighting an ‘infodemic’ a flood of disinformation and spread of conspiracy theories leading to health threats and the division of society. To combat this infodemic, there is an urgent need for benchmark datasets that can help researchers develop and evaluate models geared towards automatic detection of disinformation. While there are increasing efforts to create adequate, open-source benchmark datasets for English, comparable resources are virtually unavailable for German, leaving research for the German language lagging significantly behind. In this paper, we introduce the new benchmark dataset FANG-COVID consisting of 28,056 real and 13,186 fake German news articles related to the COVID-19 pandemic as well as data on their propagation on Twitter. Furthermore, we propose an explainable textual- and social context-based model for fake news detection, compare its performance to black-box models and perform feature ablation to assess the relative importance of human-interpretable features in distinguishing fake news from authentic news.

pdf bib
Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance LearningBERT and Multiple Instance Learning
Aalok Sathe | Joonsuk Park

Automatic fact-checking is crucial for recognizing misinformation spreading on the internet. Most existing fact-checkers break down the process into several subtasks, one of which determines candidate evidence sentences that can potentially support or refute the claim to be verified ; typically, evidence sentences with gold-standard labels are needed for this. In a more realistic setting, however, such sentence-level annotations are not available. In this paper, we tackle the natural language inference (NLI) subtaskgiven a document and a (sentence) claim, determine whether the document supports or refutes the claimonly using document-level annotations. Using fine-tuned BERT and multiple instance learning, we achieve 81.9 % accuracy, significantly outperforming the existing results on the WikiFactCheck-English dataset.

pdf bib
Neural Re-rankers for Evidence Retrieval in the FEVEROUS TaskFEVEROUS Task
Mohammed Saeed | Giulio Alfarano | Khai Nguyen | Duc Pham | Raphael Troncy | Paolo Papotti

Computational fact-checking has gained a lot of traction in the machine learning and natural language processing communities. A plethora of solutions have been developed, but methods which leverage both structured and unstructured information to detect misinformation are of particular relevance. In this paper, we tackle the FEVEROUS (Fact Extraction and VERification Over Unstructured and Structured information) challenge which consists of an open source baseline system together with a benchmark dataset containing 87,026 verified claims. We extend this baseline model by improving the evidence retrieval module yielding the best evidence F1 score among the competitors in the challenge leaderboard while obtaining an overall FEVEROUS score of 0.20 (5th best ranked system).

pdf bib
A Fact Checking and Verification System for FEVEROUS Using a Zero-Shot Learning ApproachFEVEROUS Using a Zero-Shot Learning Approach
Orkun Temiz | Özgün Ozan Kılıç | Arif Ozan Kızıldağ | Tuğba Taşkaya Temizel

In this paper, we propose a novel fact checking and verification system to check claims against Wikipedia content. Our system retrieves relevant Wikipedia pages using Anserini, uses BERT-large-cased question answering model to select correct evidence, and verifies claims using XLNET natural language inference model by comparing it with the evidence. Table cell evidence is obtained through looking for entity-matching cell values and TAPAS table question answering model. The pipeline utilizes zero-shot capabilities of existing models and all the models used in the pipeline requires no additional training. Our system got a FEVEROUS score of 0.06 and a label accuracy of 0.39 in FEVEROUS challenge.