Task 5: Rumor Verification using Evidence from Authorities
Definition
Given a rumor expressed in a tweet and a set of authorities (one or more authority Twitter accounts) for that rumor, represented by a list of tweets from their timelines during the period surrounding the rumor, the system should retrieve up to 5 evidence tweets from those timelines, and determine if the rumor is supported (true), refuted (false), or unverifiable (in case not enough evidence to verify it exists in the given tweets) according to the evidence. This task is offered in both Arabic and English.
Datasets
The training dataset is available here
Evaluation
The official evaluation measure for evidence retrieval is Mean Average Precision (MAP). The systems get no credit if they retrieve any tweets for unverifiable rumors. We will also report Recall@5.
We use the Macro-F1 to evaluate the classification of the rumors. Additionally, we will consider a Strict Macro-F1 where the rumor label is considered correct only if at least one retrieved authority evidence is correct.
Submission
All scripts can be found on gitlab at CheckThat! Lab Task 5 repository
Submission guidelines
To submit your runs (the output of your system), please use this form for each run. Your runs must be in the output format specified in the task repository. The submission form will be closed on May 6th.
Each team can submit up to 3 runs per language where:
- For each run per language, you will have to explicitly indicate if it is primary or secondary.
- One and only one run per language must be primary and it will be used for main comparison to other systems.
- For each run, you will have to explicitly indicate if external data was used for training.
- Maximum of one run per language can use external data.
Leaderboard
In all four tables only primary runs are assigned a ranking.
Evidence Retrieval (Arabic)
|
Team |
Run Priority |
Run ID |
MAP |
R@5 |
1 |
bigIR* |
primary |
bigIR-MLA-Ar |
0.618 |
0.673 |
|
IAI Group |
secondary1 |
IAI-Arabic-Crossencoder |
0.586 |
0.601 |
2 |
IAI Group |
primary |
IAI-Arabic-COLBERT |
0.564 |
0.581 |
|
bigIR* |
secondary1 |
bigIR-KGAT-Ar |
0.560 |
0.625 |
|
(Baseline) |
|
|
0.345 |
0.423 |
3 |
SCUoL |
primary |
SCUoL-1-Verification |
0.023 |
0.044 |
* Submissions include task organisers
Evidence Retrieval (English)
|
Team |
Run Priority |
Run ID |
MAP |
R@5 |
|
IAI Group |
secondary1 |
IAI-English-Crossencoder |
0.628 |
0.676 |
1 |
bigIR* |
primary |
bigIR-MLA-En |
0.604 |
0.677 |
2 |
Axolotl |
primary |
run_rr=llama_sp=llama_ rewrite=3_boundary=0,4_hashtagW=1 |
0.566 |
0.617 |
3 |
DEFAULT |
primary |
DEFAULT-Colbert1 |
0.559 |
0.634 |
4 |
IAI Group |
primary |
IAI-English-COLBERT |
0.557 |
0.590 |
5 |
AuthEv-LKolb |
primary |
AuthEv-LKolb-oai |
0.549 |
0.587 |
|
bigIR* |
secondary1 |
bigIR-KGAT-En |
0.537 |
0.618 |
|
AuthEv-LKolb |
secondary2 |
AuthEv-LKolb-terrier-oai-preprocessing |
0.524 |
0.563 |
|
AuthEv-LKolb |
secondary1 |
AuthEv-LKolb-oai-extdata |
0.510 |
0.619 |
|
Axolotl |
secondary1 |
run_rr=dl_sp=llama_ rewrite=0_boundary=0,2_hashtagW=1 |
0.489 |
0.545 |
|
Axolotl |
secondary2 |
run_rr=none_sp=dl_ rewrite=0_boundary=0,1_hashtagW=1 |
0.489 |
0.545 |
|
(Baseline) |
|
|
0.335 |
0.445 |
* Submissions include task organisers
Verification (Arabic)
|
Team |
Run Priority |
Run ID |
Macro F1 |
Strict Macro-F1 |
1 |
IAI Group |
primary |
IAI-Arabic-COLBERT |
0.600 |
0.581 |
|
IAI Group |
secondary1 |
IAI-Arabic-Crossencoder |
0.460 |
0.433 |
2 |
bigIR* |
primary |
bigIR-MLA-Ar |
0.368 |
0.300 |
3 |
SCUoL |
primary |
SCUoL-1-Verification |
0.355 |
- |
|
(Baseline) |
|
|
0.347 |
0.347 |
|
bigIR* |
secondary1 |
bigIR-KGAT-Ar |
0.258 |
0.258 |
* Submissions include task organisers
Verification (English)
|
Team |
Run Priority |
Run ID |
Macro F1 |
Strict Macro-F1 |
|
AuthEv-LKolb |
secondary1 |
AuthEv-LKolb-oai-extdata |
0.895 |
0.876 |
1 |
AuthEv-LKolb |
primary |
AuthEv-LKolb-oai |
0.879 |
0.861 |
|
AuthEv-LKolb |
secondary2 |
AuthEv-LKolb-terrier-oai-preprocessing |
0.831 |
0.831 |
2 |
Axolotl |
primary |
Axolotl-run_rr=llama_sp=llama_rewrite=3_boundary=0,4_hashtagW=1 |
0.687 |
0.687 |
|
Axolotl |
secondary1 |
Axolotl-run_rr=dl_sp=llama_rewrite=0_boundary=0,2_hashtagW=1 |
0.630 |
0.570 |
|
Axolotl |
secondary2 |
Axolotl-run_rr=none_sp=dl_rewrite=0_boundary=0,1_hashtagW=1 |
0.574 |
0.492 |
|
(Baseline) |
|
|
0.495 |
0.495 |
3 |
DEFAULT |
primary |
DEFAULT-Colbert1 |
0.482 |
0.454 |
|
IAI Group |
secondary1 |
IAI-English-Crossencoder |
0.459 |
0.444 |
4 |
bigIR* |
primary |
bigIR-MLA-En |
0.458 |
0.428 |
5 |
IAI Group |
primary |
IAI-English-COLBERT |
0.373 |
0.373 |
|
bigIR* |
secondary1 |
bigIR-KGAT-En |
0.368 |
0.357 |
* Submissions include task organisers
Organisers
- Tamer Elsayed, Qatar University, Qatar
- Fatima Haouari, Qatar University, Qatar
- Reem Suwaileh, HBKU, Qatar