- 
    
      Eastern Armenian National Corpus SubcorpusSubcorpus of EANC with various filters (authors, titles, genres, prose/poetry, original/translated, classical/new orthography
- 
    
      Eastern Armenian National Corpus Electronic LibraryThe corpus contains 4547379 words from 104 books of 12 authors
- 
    
      ARPA Armenian Paraphrase CorpusSentential paraphrase detection train, test datasets as well as BERT-based models for the Armenian language.
- 
    
      Armenian summary datasetarmsummary dataset from Hugging Face
- 
    
      pioNER - named entity annotated datasetspioNER corpus provides gold-standard and automatically generated named-entity datasets for the Armenian language. Published under Apache 2.0 license
- 
    
      Armenian language dataset from CC-100, monolingual Datasets from Web Crawl DataArmenian language dataset extracted from CC-100 research dataset Description from website This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus...