Task 3: Generating Full Fact-Checking Articles
Definition
This task introduces a new, final task into the CLEF CheckThat! lab pipeline, which attempts to automate the fact-checking article writing process. Here, given a claim, its veracity, and a set of evidence documents consulted for fact-checking the claim, generate a full fact-checking article.
Datasets
We reuse the WatClaimCheck dataset as training and validation sets. To download and use a copy of the WatClaimCheck dataset, follow the instructions available here. The dataset also provides scraped content from the premise articles / evidence webpages where available.
The test set as well as the test set with ground truth articles are available on Google Drive here.
Expected Output
For each datapoint, we expect the output format in the following format:
{
"id": 42,
"factchecking_article": "<corresponding generated article for \"OpIndia claimed Greta Thunberg's real name is Ghazala bhat -- by Faisal Al Qasimi, Carolina Monteiro\">"
}
Furthermore, the evaluation script expects inline citations wherever necessary in the following format:
"This is a sample sentence making a claim grounded in some evidence (source: https://www.example1.com). This is another sample sentence making a claim grounded in multiple evidence (source: https://www.example2.com; https://www.example3.com)."
If citations are not present (or are not present in this format), the evaluation script will penalise your submissions. Please reach out in case your submissions have any issues. We will be available for any rectifications to ensure fair evaluation for everyone.
Evaluation
The evaluation scripts are available here.
We will use the mean of the following metrics to assess the generated text:
(i) entailment score, a reference-based metric that measures if the generated text is entailed by the reference;
(ii) citation recall and (iii) citation precision, as proposed by Gao et al. (2023);
and (iv) evidence coverage, which computes the proportion of input evidence that is correctly cited in the generated text.
Note that the evaluation script expects inline citations wherever necessary in the following format:
"This is a sample sentence making a claim grounded in some evidence (source: https://www.example1.com). This is another sample sentence making a claim grounded in multiple evidence (source: https://www.example2.com; https://www.example3.com)."
If citations are not present (or are not present in this format), the evaluation script will penalise your submissions. Please reach out in case your submissions have any issues. We will be available for any rectifications to ensure fair evaluation for everyone.
Submission
The scorer, format checker and baseline scripts were released and are available here. Please follow the instructions in the README file for the setup.
Submission Site
UPDATE (24 April 2026): The test phase has started. Submission site: https://www.codabench.org/competitions/15892/.
UPDATE (10 May 2026): The test phase has ended. Please find the leaderboard below.
Submission Guidelines
Test phase submissions on CodaBench: https://www.codabench.org/competitions/15892/
- You may make an unlimited number of submissions during the test phase. However, only the last valid submission will be evaluated to rank each team in the final leaderboard.
- We have provided a starter code (check the Files tab on CodaBench) to help generate the predictions. Specifically, you should generate your predictions in the expected output format as explained in the Dataset tab as a json file. Name the file
prediction.json. Create a ZIP containing only prediction.json. Submit this ZIP for scoring.
- We urge the participants to run the format checker on their submissions to ensure their submissions are valid. In case the participants run into any unforeseen issues, please contact dhruv.sahnan@mbzuai.ac.ae. We have explained the input and output formats in the Dataset tab.
- Additionally, CodaBench’s submission portal will verify if the format of your submission follows the expected output. If it fails, the submission will be discarded.
Leaderboard
UPDATE (24 April 2026): The test phase has started. Submission site: https://www.codabench.org/competitions/15892/. The leaderboard is hidden at the moment, and will be made public at the end of the test phase.
UPDATE (10 May 2026): The test phase has ended. The leaderboard for all the teams that participated is presented below:
| Team Name |
Mean Entailment Score |
Mean Citation Precision |
Mean Citation Recall |
Mean Evidence Coverage |
Mean Score |
| 🥇 Outsider |
0.278 |
0.671 |
0.671 |
0.563 |
0.546 |
| 🥈 UTS |
0.341 |
0.299 |
0.299 |
0.996 |
0.484 |
| 🥉 DS GT CheckThat |
0.295 |
0.240 |
0.276 |
0.902 |
0.428 |
| Facty |
0.229 |
0.231 |
0.254 |
0.995 |
0.427 |
| UCPH RMT |
0.364 |
0.144 |
0.174 |
0.993 |
0.419 |
| Ro |
0.321 |
0.463 |
0.463 |
0.343 |
0.398 |
| Sourceminds |
0.245 |
0.337 |
0.339 |
0.394 |
0.329 |
| CheckMate |
0.221 |
0.350 |
0.350 |
0.391 |
0.328 |
| KNU Fact |
0.227 |
0.287 |
0.295 |
0.440 |
0.312 |
| Baseline |
0.298 |
0.223 |
0.240 |
0.329 |
0.272 |
| Shtian (not registered in CLEF) |
0.279 |
0.194 |
0.198 |
0.210 |
0.220 |
UPDATE (12 May 2026): Additional late submissions or submissions outside of the CodaBench competition:
| Team Name |
Mean Entailment Score |
Mean Citation Precision |
Mean Citation Recall |
Mean Evidence Coverage |
Mean Score |
| Team AKSH |
0.329 |
0.301 |
0.318 |
0.564 |
0.378 |
Note:
- Team names are as they were provided by the participants in the CLEF registration form, not from the CodaBench submission. In case (as a participant), you do not see your team listed in this leaderboard or if you think there are any mistakes on the leaderboard, please reach out on dhruv.sahnan@mbzuai.ac.ae.
- Evaluation results for each team have also been made public and are available on the CheckThat! Lab GitLab repository here.
Organizers
- Dhruv Sahnan, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
- Tanmoy Chakraborty, Indian Institute of Technology - Delhi, India
- Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Contact dhruv.sahnan@mbzuai.ac.ae for any questions.