-
ARPA Armenian Paraphrase Corpus
Sentential paraphrase detection train, test datasets as well as BERT-based models for the Armenian language. -
Armenian summary dataset
armsummary dataset from Hugging Face -
Armenian wikipedia (hywiki) XML dumps
Dumps of the Armenian wikipedia provided by Wikimedia foundation. Available as gzipped XML files -
Armenian language dataset from CC-100, monolingual Datasets from Web Crawl Data
Armenian language dataset extracted from CC-100 research dataset Description from website This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus...