Dataset - Data Catalog Armenia

Eastern Armenian National Corpus Subcorpus

Subcorpus of EANC with various filters (authors, titles, genres, prose/poetry, original/translated, classical/new orthography
- HTML
Eastern Armenian National Corpus Electronic Library

The corpus contains 4547379 words from 104 books of 12 authors
- HTML
ARPA Armenian Paraphrase Corpus

Sentential paraphrase detection train, test datasets as well as BERT-based models for the Armenian language.
- HTML
- CSV
Armenian summary dataset

armsummary dataset from Hugging Face
- HTML
- CSV
pioNER - named entity annotated datasets

pioNER corpus provides gold-standard and automatically generated named-entity datasets for the Armenian language. Published under Apache 2.0 license
- HTML
- conll03
Armenian language dataset from CC-100, monolingual Datasets from Web Crawl Data

Armenian language dataset extracted from CC-100 research dataset Description from website This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus...
- HTML
- TXT

You can also access this registry using the API (see API Docs).

6 datasets found