Task 3: Generating Full Fact-Checking Articles

Definition

This task introduces a new, final task into the CLEF CheckThat! lab pipeline, which attempts to automate the fact-checking article writing process. Here, given a claim, its veracity, and a set of evidence documents consulted for fact-checking the claim, generate a full fact-checking article.

Datasets

We reuse the WatClaimCheck dataset as training and validation sets. In this dataset, each datapoint follows the following format:

{
    "metadata": {
      "claimant": "Faisal Al Qasimi, Carolina Monteiro",
      "claim": "OpIndia claimed Greta Thunberg's real name is Ghazala bhat",
      "claim_date": "2016-06-20",
      "review_date": "2021-02-06",
      "id": 42,
      "premise_articles": {
        "https://web.archive.org/web/20210206135409/https://twitter.com/omar_quraishi/status/1357926247414845441": "42_1.json",
        "https://web.archive.org/web/20210206083718/https://twitter.com/runcaralisarun/status/1357714907249086465": "42_2.json",
        "https://www.facebook.com/search/photos/?q=opindia%20greta%20ghazala": "42_3.json",
        "https://twitter.com/UnSubtleDesi/status/1357723484491718659": "42_4.json"
      }
    },
    "label": {
      "reviewer_name": "Alt News",
      "reviewer_site": "altnews.in",
      "review_url": "https://www.altnews.in/morphed-opindia-screengrab-claims-greta-thunbergs-real-name-is-ghazala-bhat/",
      "rating": 0,
      "original_rating": "false",
      "id": 1,
      "review_article": "42.json"
    }
  }

To download and use a copy of the WatClaimCheck dataset, follow the instructions available here. The dataset also provides scraped content from the premise articles / evidence webpages where available.

We will release our test sets, along with content from the evidence webpages where available, in the same format soon!

Evaluation

We will use the mean of the following metrics to assess the generated text: (i) entailment score, a reference-based metric that measures if the generated text is entailed by the reference; (ii) citation correctness, which verifies if a text attributed to a citation can be entailed by the corresponding evidence; and (iii) citation completeness, which computes the proportion of input evidence that is correctly cited in the generated text.

Submission

Scorer, Format Checker, and Baseline Scripts

TBA

Submission Site

TBA

Submission Guidelines

TBA

Leaderboard

TBA

Organizers

Dhruv Sahnan, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Tanmoy Chakraborty, Indian Institute of Technology - Delhi, India
Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates

Contact

Contact dhruv.sahnan@mbzuai.ac.ae for any questions.

CheckThat! Lab at CLEF 2026

Contents