CheckThat! Lab at CLEF 2023

Home

Editions

Tasks

Contents

Task 1: Check-Worthiness in Multimodal and Multigenre Content

Definition

The aim of this task is to determine whether a claim in a tweet is worth fact-checking. Typical approaches to make that decision require to either resort to the judgments of professional fact-checkers or to human annotators to answer several auxiliary questions such as “does it contain a verifiable factual claim?”, and “is it harmful?”, before deciding on the final check-worthiness label.

This year we offer two kinds of data, which translate to two subtasks:

  • Subtask 1A (Multimodal): Tweets includes both a text snippet and an image have to be assessed for check-worthiness.
  • Subtask 1B (Multigenre): A text snippet alone —from a tweet or a debate/speech transcription— has to be assessed for check-worthiness.

Subtask 1A is offered in Arabic and English, whereas Subtask 1B is offered in Arabic, English and Spanish.

Datasets

Subtask 1A (Multimodal):

Each instance is composed ot the text and the image associated to a tweet.

Subtask 1B (Multigenre):

Each instance is composed of only text, which could come from a tweet, the transcription of a debate or the transcription of speech.

Evaluation

This is a binary classification task. The official evaluation metric is F_1 over the positive class.

Submission

Scorers, Format Checkers, and Baseline Scripts

All scripts can be found on gitlab at CheckThat! Lab Task 1 repository

Submission guidelines

  • Make sure that you create one account single account for your team, and submit runs through that account only.
  • The last file submitted to the leaderboard will be considered as the final submission.
  • The file with your predictions should be called subtask1[A,B]_[lang].tsv, where A or B refer to the specific subtask and lang can be arabic, english or spanish. Get sure to set .tsv as the file extension; otherwise, you will get an error on the leaderboard. Subtasks are 1A and 1B. For subtask 1A, there are two languages (Arabic, and English). For subtask 1B, there are three languages (Arabic, Spanish, and English). For instance, a submission file for task 1B Arabic should be subtask1B_arabic.tsv.
  • You have to zip the tsv, zip subtask1B_arabic.zip subtask1B_arabic.tsv and submit it through the codalab page.
  • You have to include the team name and the description of your method each submission. Your team name must EXACTLY match the one used during the CLEF registration.
  • You are allowed to submit max 200 submissions per day for each subtask.
  • We will keep the leaderboard private until the end of the submission period, hence, results will not be available upon submission. All results will be available after the evaluation period.

Submission Site

Task 1: Codalab

Contact

Please join the Slack channel for any queries: https://join.slack.com/t/newworkspace-qgd1635/shared_invite/zt-1k9lnm0ys-aMeEmUtGY0tfW5HbkXHPJg

Slack Channel (task1-check-worthiness): https://ct-23-participants.slack.com

Or send an email to: clef-factcheck@googlegroups.com

Leaderboard

All baselines are random systems.

1A Arabic

Team F1
1 CSECU-DSG 0.399
2 marvinpeng 0.312
3 Z-Index 0.301
- TeamX 0.300
5 Baseline 0.299

* Submissions without position were submitted after the deadline.

1A English

Team F1
1 Fraunhofer SIT 0.712
2 ZHAW-CAI 0.708
- ES-VRAI 0.704
3 marvinpeng 0.697
- TeamX 0.671
4 CSECU-DSG 0.628
5 Z-Index 0.495
6 Baseline 0.474

* Submissions without position were submitted after the deadline.

1B Arabic

Team F1
1 ES-VRAI 0.809
2 Accenture 0.733
3 Z-Index 0.710
4 CSECU-DSG 0.662
5 DSHacker 0.633
6 Baseline 0.625
- FakeDTML 0.530

* Submissions without position were submitted after the deadline.

1B English

Team F1
1 OpenFact 0.898
2 Fraunhofer SIT 0.878
3 Accenture 0.860
4 NLPIR-UNED 0.851
5 ES-VRAI 0.843
6 Z-Index 0.838
7 CSECU-DSG 0.834
8 FakeDTML 0.833
9 DSHacker 0.819
10 Pikachu 0.767
- UGPLN y SINAI 0.757
Baseline 0.462

* Submissions without position were submitted after the deadline.

1B Spanish

Team F1
1 DSHacker 0.641
2 ES-VRAI 0.627
3 CSECU-DSG 0.599
4 NLPIR-UNED 0.589
5 Accenture 0.509
6 Z-Index 0.496
- FakeDTML 0.440
7 Baseline 0.172

* Submissions without position were submitted after the deadline.

Organizers

  • Firoj Alam, Qatar Computing Research Institute, HBKU
  • Alberto Barrón-Cedeño, Università di Bologna, Italy
  • Gullal S. Cheema, TIB – Leibniz Information Centre for Science and Technology
  • Sherzod Hakimov, University of Potsdam
  • Maram Hasanain, Qatar Computing Research Institute, HBKU
  • Chengkai Li, The University of Texas at Arlington
  • Rubén Miguez, Newtral, Spain
  • Hamdy Mubarak, Qatar Computing Research Institute, HBKU
  • Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence
  • Gautam Kishore Shahi, University of Duisburg-Essen
  • Wajdi Zaghouani, HBKU