CheckThat! Lab at CLEF 2024

Home

Editions

Tasks

Contents

Task 5: Rumor Verification using Evidence from Authorities

Definition

Given a rumor expressed in a tweet and a set of authorities (one or more authority Twitter accounts) for that rumor, represented by a list of tweets from their timelines during the period surrounding the rumor, the system should retrieve up to 5 evidence tweets from those timelines, and determine if the rumor is supported (true), refuted (false), or unverifiable (in case not enough evidence to verify it exists in the given tweets) according to the evidence. This task is offered in both Arabic and English.

Datasets

The training dataset is available here

Evaluation

The official evaluation measure for evidence retrieval is Mean Average Precision (MAP). The systems get no credit if they retrieve any tweets for unverifiable rumors. We will also report Recall@5.

We use the Macro-F1 to evaluate the classification of the rumors. Additionally, we will consider a Strict Macro-F1 where the rumor label is considered correct only if at least one retrieved authority evidence is correct.

Submission

Scorers, Format Checkers, and Baseline Scripts

All scripts can be found on gitlab at CheckThat! Lab Task 5 repository

Submission guidelines

To submit your runs (the output of your system), please use this form for each run. Your runs must be in the output format specified in the task repository. The submission form will be closed on May 6th.

Each team can submit up to 3 runs per language where:

  • For each run per language, you will have to explicitly indicate if it is primary or secondary.
  • One and only one run per language must be primary and it will be used for main comparison to other systems.
  • For each run, you will have to explicitly indicate if external data was used for training.
  • Maximum of one run per language can use external data.

Leaderboard

In all four tables only primary runs are assigned a ranking.

Evidence Retrieval (Arabic)

Team Run Priority Run ID MAP R@5
1 bigIR* primary bigIR-MLA-Ar 0.618 0.673
IAI Group secondary1 IAI-Arabic-Crossencoder 0.586 0.601
2 IAI Group primary IAI-Arabic-COLBERT 0.564 0.581
bigIR* secondary1 bigIR-KGAT-Ar 0.560 0.625
(Baseline) 0.345 0.423
3 SCUoL primary SCUoL-1-Verification 0.023 0.044

* Submissions include task organisers

Evidence Retrieval (English)

Team Run Priority Run ID MAP R@5
IAI Group secondary1 IAI-English-Crossencoder 0.628 0.676
1 bigIR* primary bigIR-MLA-En 0.604 0.677
2 Axolotl primary run_rr=llama_sp=llama_ rewrite=3_boundary=0,4_hashtagW=1 0.566 0.617
3 DEFAULT primary DEFAULT-Colbert1 0.559 0.634
4 IAI Group primary IAI-English-COLBERT 0.557 0.590
5 AuthEv-LKolb primary AuthEv-LKolb-oai 0.549 0.587
bigIR* secondary1 bigIR-KGAT-En 0.537 0.618
AuthEv-LKolb secondary2 AuthEv-LKolb-terrier-oai-preprocessing 0.524 0.563
AuthEv-LKolb secondary1 AuthEv-LKolb-oai-extdata 0.510 0.619
Axolotl secondary1 run_rr=dl_sp=llama_ rewrite=0_boundary=0,2_hashtagW=1 0.489 0.545
Axolotl secondary2 run_rr=none_sp=dl_ rewrite=0_boundary=0,1_hashtagW=1 0.489 0.545
(Baseline) 0.335 0.445

* Submissions include task organisers

Verification (Arabic)

Team Run Priority Run ID Macro F1 Strict Macro-F1
1 IAI Group primary IAI-Arabic-COLBERT 0.600 0.581
IAI Group secondary1 IAI-Arabic-Crossencoder 0.460 0.433
2 bigIR* primary bigIR-MLA-Ar 0.368 0.300
3 SCUoL primary SCUoL-1-Verification 0.355 -
(Baseline) 0.347 0.347
bigIR* secondary1 bigIR-KGAT-Ar 0.258 0.258

* Submissions include task organisers

Verification (English)

Team Run Priority Run ID Macro F1 Strict Macro-F1
AuthEv-LKolb secondary1 AuthEv-LKolb-oai-extdata 0.895 0.876
1 AuthEv-LKolb primary AuthEv-LKolb-oai 0.879 0.861
AuthEv-LKolb secondary2 AuthEv-LKolb-terrier-oai-preprocessing 0.831 0.831
2 Axolotl primary Axolotl-run_rr=llama_sp=llama_rewrite=3_boundary=0,4_hashtagW=1 0.687 0.687
Axolotl secondary1 Axolotl-run_rr=dl_sp=llama_rewrite=0_boundary=0,2_hashtagW=1 0.630 0.570
Axolotl secondary2 Axolotl-run_rr=none_sp=dl_rewrite=0_boundary=0,1_hashtagW=1 0.574 0.492
(Baseline) 0.495 0.495
3 DEFAULT primary DEFAULT-Colbert1 0.482 0.454
IAI Group secondary1 IAI-English-Crossencoder 0.459 0.444
4 bigIR* primary bigIR-MLA-En 0.458 0.428
5 IAI Group primary IAI-English-COLBERT 0.373 0.373
bigIR* secondary1 bigIR-KGAT-En 0.368 0.357

* Submissions include task organisers

Organisers

  • Tamer Elsayed, Qatar University, Qatar
  • Fatima Haouari, Qatar University, Qatar
  • Reem Suwaileh, HBKU, Qatar