Definition
This task introduces a new, final task into the CLEF CheckThat! lab pipeline, which attempts to automate the fact-checking article writing process. Here, given a claim, its veracity, and a set of evidence documents consulted for fact-checking the claim, generate a full fact-checking article.
Datasets
We reuse the WatClaimCheck dataset as training and validation sets. In this dataset, each datapoint follows the following format:
{
"metadata": {
"claimant": "Faisal Al Qasimi, Carolina Monteiro",
"claim": "OpIndia claimed Greta Thunberg's real name is Ghazala bhat",
"claim_date": "2016-06-20",
"review_date": "2021-02-06",
"id": 42,
"premise_articles": {
"https://web.archive.org/web/20210206135409/https://twitter.com/omar_quraishi/status/1357926247414845441": "42_1.json",
"https://web.archive.org/web/20210206083718/https://twitter.com/runcaralisarun/status/1357714907249086465": "42_2.json",
"https://www.facebook.com/search/photos/?q=opindia%20greta%20ghazala": "42_3.json",
"https://twitter.com/UnSubtleDesi/status/1357723484491718659": "42_4.json"
}
},
"label": {
"reviewer_name": "Alt News",
"reviewer_site": "altnews.in",
"review_url": "https://www.altnews.in/morphed-opindia-screengrab-claims-greta-thunbergs-real-name-is-ghazala-bhat/",
"rating": 0,
"original_rating": "false",
"id": 1,
"review_article": "42.json"
}
}
To download and use a copy of the WatClaimCheck dataset, follow the instructions available here. The dataset also provides scraped content from the premise articles / evidence webpages where available.
We will release our test sets, along with content from the evidence webpages where available, in a similar format soon!
Expected Output
For each datapoint, we expect the output format in the following format:
{
"id": 42,
"factchecking_article": "<corresponding generated article for \"OpIndia claimed Greta Thunberg's real name is Ghazala bhat -- by Faisal Al Qasimi, Carolina Monteiro\">"
}
Furthermore, the evaluation script expects inline citations wherever necessary in the following format:
This is a sample sentence making a claim grounded in some evidence (source: https://www.example1.com). This is another sample sentence making a claim grounded in multiple evidence (source: https://www.example2.com; https://www.example3.com).
If citations are not present (or are not present in this format), the evaluation script will penalise your submissions. Please reach out in case your submissions have any issues. We will be available for any rectifications to ensure fair evaluation for everyone.
Evaluation
The evaluation scripts are available here.
We will use the mean of the following metrics to assess the generated text:
(i) entailment score, a reference-based metric that measures if the generated text is entailed by the reference;
(ii) citation recall and (iii) citation precision, as proposed by Gao et al. (2023);
and (iv) evidence coverage, which computes the proportion of input evidence that is correctly cited in the generated text.
Note that the evaluation script expects inline citations wherever necessary in the following format:
This is a sample sentence making a claim grounded in some evidence (source: https://www.example1.com). This is another sample sentence making a claim grounded in multiple evidence (source: https://www.example2.com; https://www.example3.com).
If citations are not present (or are not present in this format), the evaluation script will penalise your submissions. Please reach out in case your submissions have any issues. We will be available for any rectifications to ensure fair evaluation for everyone.
Submission
The scorer, format checker and baseline scripts were released and are available here. Please follow the instructions in the README file for the setup. The same will also be used for the test phase.
Submission Site
UPDATE (24 April 2026): The test phase has started. Submission site: https://www.codabench.org/competitions/15892/
Submission Guidelines
Test phase submissions on CodaBench: https://www.codabench.org/competitions/15892/
- You may make an unlimited number of submissions during the test phase. However, only the last valid submission will be evaluated to rank each team in the final leaderboard.
- We have provided a starter code (check the Files tab on CodaBench) to help generate the predictions. Specifically, you should generate your predictions in the expected output format as explained in the Dataset tab as a json file. Name the file
prediction.json. Create a ZIP containing only prediction.json. Submit this ZIP for scoring.
- We urge the participants to run the format checker on their submissions to ensure their submissions are valid. In case the participants run into any unforeseen issues, please contact dhruv.sahnan@mbzuai.ac.ae. We have explained the input and output formats in the Dataset tab.
- Additionally, CodaBench’s submission portal will verify if the format of your submission follows the expected output. If it fails, the submission will be discarded.
Leaderboard
UPDATE (24 April 2026): The test phase has started. Submission site: https://www.codabench.org/competitions/15892/. The leaderboard is hidden at the moment, and will be made public at the end of the test phase.
Organizers
- Dhruv Sahnan, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
- Tanmoy Chakraborty, Indian Institute of Technology - Delhi, India
- Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Contact dhruv.sahnan@mbzuai.ac.ae for any questions.