-
Eastern Armenian National Corpus Subcorpus
Subcorpus of EANC with various filters (authors, titles, genres, prose/poetry, original/translated, classical/new orthography -
Eastern Armenian National Corpus Electronic Library
The corpus contains 4547379 words from 104 books of 12 authors -
ARPA Armenian Paraphrase Corpus
Sentential paraphrase detection train, test datasets as well as BERT-based models for the Armenian language. -
Armenian summary dataset
armsummary dataset from Hugging Face -
pioNER - named entity annotated datasets
pioNER corpus provides gold-standard and automatically generated named-entity datasets for the Armenian language. Published under Apache 2.0 license -
Armenian language dataset from CC-100, monolingual Datasets from Web Crawl Data
Armenian language dataset extracted from CC-100 research dataset Description from website This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus...