Task 1: Check-Worthiness Estimation
Definition
The aim of this task is to determine whether a claim in a tweet and/or transcriptions is worth fact-checking. Typical approaches to make that decision require to either resort to the judgments of professional fact-checkers or to human annotators to answer several auxiliary questions such as “does it contain a verifiable factual claim?”, and “is it harmful?”, before deciding on the final check-worthiness label https://aclanthology.org/2021.findings-emnlp.56.pdf.
This year, we are offering multi-genre data: the tweets and/or transcriptions should be judged based solely on the text. The task is available in Arabic, English, and Spanish.
Datasets
Data for all languages are available here
Each instance is composed of only text, which could come from a tweet, the transcription of a debate or the transcription of speech.
Evaluation
This is a binary classification task. The official evaluation metric is F_1 over the positive class.
Submission
All scripts can be found on gitlab at CheckThat! Lab Task 1 repository
Submission guidelines
- Make sure that you create one account for each team, and submit runs through one account only.
- The last file submitted to the leaderboard will be considered as the final submission.
- Name of the output file has to be task1_lang.tsv with .tsv extension (e.g., task1_arabic.tsv); otherwise, you will get an error on the leaderboard. Three languages are possible (Arabic, English, and Dutch).
- You have to zip the tsv, zip task1_arabic.zip task1_arabic.tsv and submit it through the codalab page.
- It is required to submit the team name and method description for each submission. Your team name here must EXACTLY match that used during CLEF registration.
- You are allowed to submit max 200 submissions per day.
- We will keep the leaderboard private till the end of the submission period, hence, results will not be available upon submission. All results will be available after the evaluation period.
Submission Site
Task 1: Codalab
Leaderboard
All baselines are random systems.
Task 1 Arabic
|
Team |
F1 |
1 |
visty |
0.569 |
2 |
teamopenfact |
0.557 |
3 |
DSHacker |
0.538 |
4 |
TurQUaz |
0.533 |
5 |
SemanticCUETSync |
0.532 |
6 |
mjmanas54 |
0.531 |
7 |
Fired_from_NLP |
0.530 |
8 |
Madussree |
0.530 |
9 |
pandas |
0.520 |
10 |
hybrinfox |
0.519 |
11 |
Mirela |
0.478 |
12 |
DataBees |
0.460 |
13 |
Baseline |
0.418 |
14 |
JUNLP |
0.212 |
Task 1 Dutch
|
Team |
F1 |
1 |
TurQUaz |
0.732 |
2 |
DSHacker |
0.730 |
3 |
visty |
0.718 |
4 |
Mirela |
0.650 |
5 |
Zamoranesis |
0.601 |
6 |
FC_RUG |
0.594 |
7 |
teamopenfact |
0.590 |
8 |
hybrinfox |
0.589 |
9 |
mjmanas54 |
0.577 |
10 |
DataBees |
0.563 |
11 |
JUNLP |
0.550 |
12 |
Fired_from_NLP |
0.543 |
13 |
Madussree |
0.482 |
14 |
Baseline |
0.438 |
15 |
pandas |
0.308 |
16 |
SemanticCUETSync |
0.218 |
Task 1 English
|
Team |
F1 |
1 |
FactFinders |
0.802 |
2 |
teamopenfact |
0.796 |
3 |
innavogel |
0.780 |
4 |
mjmanas54 |
0.778 |
5 |
ZHAW_Students |
0.771 |
6 |
SemanticCUETSync |
0.763 |
7 |
SINAI |
0.761 |
8 |
DSHacker |
0.760 |
9 |
visty |
0.753 |
10 |
Fired_from_NLP |
0.745 |
11 |
TurQUaz |
0.718 |
12 |
hybrinfox |
0.711 |
13 |
SSN-NLP |
0.706 |
14 |
sz06571 |
0.696 |
15 |
NapierNLP |
0.675 |
16 |
Mirela |
0.658 |
17 |
Kushal_Chandani |
0.658 |
18 |
DataBees |
0.619 |
19 |
Trio_Titans |
0.600 |
20 |
Madussree |
0.583 |
21 |
pandas |
0.579 |
22 |
JUNLP |
0.541 |
23 |
mariuxi |
0.517 |
24 |
grig95 |
0.497 |
25 |
CLaC-2 |
0.494 |
26 |
Aqua_Wave |
0.339 |
27 |
Baseline |
0.307 |
Organizers
- Firoj Alam, Qatar Computing Research Institute, HBKU, Qatar
- Maram Hasanain, Qatar Computing Research Institute, HBKU, Qatar
- Alberto Barrón-Cedeño, Università di Bologna, Italy
- Reem Suwaileh, HBKU, Qatar
- Chengkai Li, The University of Texas at Arlington, USA
- Rubén Miguez, Newtral, Spain
- Wajdi Zaghouani, HBKU, Qatar
- Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, UAE
- Sanne Weering, University of Groningen, Netherlands
- Tommaso Caselli, University of Groningen, Netherlands