-
Armenian handwritten text recognition
Handwritten texts in Armenian with labeled words and a trained neural network differentiating word boundaries. -
Sentiment and Emotion Armenian Lexicons
Data for Armenian BERT models. All datasets and code can be accessed at ArmenianNLP Github page -
Eastern Armenian National Corpus Subcorpus
Subcorpus of EANC with various filters (authors, titles, genres, prose/poetry, original/translated, classical/new orthography -
Armenia-related Books in the HathiTrust Digital Library
Catalogue of all texts available in the libraries cooperating with HathiTrust. The keywords are 'Armenia' and 'Armenian', the database contains links to each book page and... -
Index of Digitized Armenian Manuscripts
The Index of Armenian Manuscripts lists Armenian manuscripts digitized and available in full access in digital libraries. It compiles the main metadata available in the catalogs... -
All Unicode Armenian Fonts
Free and non-commercial Armenian fonts -
Armenian legislation database from ARLIS
Armenia legislation database extracted from the ARLIS website (arils.am) with all metadata and texts of Armenian laws and other legal documents. The dataset is relatively big,... -
Eastern Armenian National Corpus Electronic Library
The corpus contains 4547379 words from 104 books of 12 authors -
National Library of Armenia repository REST API
REST API of the DSpace installation of National Library of Armenia repository. -
ARPA Armenian Paraphrase Corpus
Sentential paraphrase detection train, test datasets as well as BERT-based models for the Armenian language. -
Armenian summary dataset
armsummary dataset from Hugging Face -
pioNER - named entity annotated datasets
pioNER corpus provides gold-standard and automatically generated named-entity datasets for the Armenian language. Published under Apache 2.0 license -
Armenian wikipedia (hywiki) XML dumps
Dumps of the Armenian wikipedia provided by Wikimedia foundation. Available as gzipped XML files -
Armenian language dataset from CC-100, monolingual Datasets from Web Crawl Data
Armenian language dataset extracted from CC-100 research dataset Description from website This corpus is an attempt to recreate the dataset used for training XLM-R. This corpus...