8 topics, 100K+ data.

MultiPIT

Multi-Topic Paraphrases in Twitter

 

ABOUT

MultiPIT is the largest Twitter-based paraphrase corpus to-date. It contains four parts: MultiPITcrowd, MultiPITexpert, MultiPITAuto, MultiPITNMR. MultiPITcrowd is a collection of crowdsourcing annoations with loosely defined paraphrase definitions. MultiPITexpert is a collection of expert annotations with strict defined paraphrase definitions. MultiPITAuto is a collection of automatically identified paraphrases pairs from recent Twitter data. MultiPITNMR is the first multi-reference test set for parpahrase generation.

TALK VIDEO

DATA (available now)

MultiPITcrowd
100K+ crowdsourcing annotations
MultiPITexpert
5K+ expert annotations
MultiPITAuto
500K+ automatic annotations
MultiPITNMR
200 × 8 expert annotations

CODE (coming soon...)

Acknowledgement: This material is based in part on research sponsored by IARPA via the BETTER program (contract 19051600004).

Leaderboard
search
Rank Model Date Precision Recall Accuracy F1
Rank Model Date Precision Recall Accuracy F1
Rank Model Date BERT-iBLEU Self-BLEU BERT-Score BLEU
Rank Metric Referenceless Fluency Correlation Semantic Similarity Correlation Diversity Correlation Overall Correlation