XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
The Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models. It covers 40 typologically diverse languages (spanning 12 language families) and includes nine tasks that collectively require reasoning about different levels of syntax and semantics. The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data.
The tasks included in XTREME cover a range of standard paradigms in NLP, including sentence classification, structured prediction, sentence retrieval and question answering.
In order for models to be successful on the XTREME benchmark, they must learn representations that generalize across many tasks and languages. Each of the tasks covers a subset of the 40 languages included in XTREME. The languages were selected among the top 100 languages with the most Wikipedia articles to maximize language diversity, task coverage, and availability of training data.
More at blogpost
Paper: https://arxiv.org/abs/2003.11080.pdf
GitHub: https://github.com/google-research/xtreme/
#nlp #evaluation #benchmark
The Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models. It covers 40 typologically diverse languages (spanning 12 language families) and includes nine tasks that collectively require reasoning about different levels of syntax and semantics. The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data.
The tasks included in XTREME cover a range of standard paradigms in NLP, including sentence classification, structured prediction, sentence retrieval and question answering.
In order for models to be successful on the XTREME benchmark, they must learn representations that generalize across many tasks and languages. Each of the tasks covers a subset of the 40 languages included in XTREME. The languages were selected among the top 100 languages with the most Wikipedia articles to maximize language diversity, task coverage, and availability of training data.
More at blogpost
Paper: https://arxiv.org/abs/2003.11080.pdf
GitHub: https://github.com/google-research/xtreme/
#nlp #evaluation #benchmark