Armenian Language - Groups - Data Catalog Armenia

Eastern Armenian National Corpus Subcorpus

Subcorpus of EANC with various filters (authors, titles, genres, prose/poetry, original/translated, classical/new orthography

HTML

Eastern Armenian National Corpus Electronic Library

The corpus contains 4547379 words from 104 books of 12 authors

HTML

ARPA Armenian Paraphrase Corpus

Sentential paraphrase detection train, test datasets as well as BERT-based models for the Armenian language.

HTML
CSV

Armenian summary dataset

armsummary dataset from Hugging Face

HTML
CSV

pioNER - named entity annotated datasets

pioNER corpus provides gold-standard and automatically generated named-entity datasets for the Armenian language. Published under Apache 2.0 license

HTML
conll03

Armenian language dataset from CC-100, monolingual Datasets from Web Crawl Data

Armenian language dataset extracted from CC-100 research dataset Description from website This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus...

HTML
TXT