Armenian Language - Groups - Data Catalog Armenia

Armenian handwritten text recognition

Handwritten texts in Armenian with labeled words and a trained neural network differentiating word boundaries.

yolo

Sentiment and Emotion Armenian Lexicons

Data for Armenian BERT models. All datasets and code can be accessed at ArmenianNLP Github page

TXT
CSV

Eastern Armenian National Corpus Subcorpus

Subcorpus of EANC with various filters (authors, titles, genres, prose/poetry, original/translated, classical/new orthography

HTML

Armenia-related Books in the HathiTrust Digital Library

Catalogue of all texts available in the libraries cooperating with HathiTrust. The keywords are 'Armenia' and 'Armenian', the database contains links to each book page and...

CSV

Index of Digitized Armenian Manuscripts

The Index of Armenian Manuscripts lists Armenian manuscripts digitized and available in full access in digital libraries. It compiles the main metadata available in the catalogs...

CSV

All Unicode Armenian Fonts

Free and non-commercial Armenian fonts

tff

Armenian legislation database from ARLIS

Armenia legislation database extracted from the ARLIS website (arils.am) with all metadata and texts of Armenian laws and other legal documents. The dataset is relatively big,...

JSONL

Eastern Armenian National Corpus Electronic Library

The corpus contains 4547379 words from 104 books of 12 authors

HTML

National Library of Armenia repository REST API

REST API of the DSpace installation of National Library of Armenia repository.

REST API

ARPA Armenian Paraphrase Corpus

Sentential paraphrase detection train, test datasets as well as BERT-based models for the Armenian language.

HTML
CSV

Armenian summary dataset

armsummary dataset from Hugging Face

HTML
CSV

pioNER - named entity annotated datasets

pioNER corpus provides gold-standard and automatically generated named-entity datasets for the Armenian language. Published under Apache 2.0 license

HTML
conll03

Armenian wikipedia (hywiki) XML dumps

Dumps of the Armenian wikipedia provided by Wikimedia foundation. Available as gzipped XML files

XML

Armenian language dataset from CC-100, monolingual Datasets from Web Crawl Data

Armenian language dataset extracted from CC-100 research dataset Description from website This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus...

HTML
TXT

14 datasets found