Datasets
Mada-French Parallel Corpus 1.0
License: NOODL-1.0
Locale: mxu
Task: TTS
Format: TSV
Size: 122.37 KB
Javanese TTS of Banyumasan Dialect
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: WEBM, TSV
Size: 559.08 MB
Finnish Public Domain 20th Century Literature Text Corpus
License: CC0-1.0
Locale: fi, sv
Task: NLP
Format: TXT
Size: 205.76 MB
Thorsten-Voice-44kHz-Full
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,PARQUET
Size: 7.99 GB
Thorsten-Voice Dataset 2023.09 Hessisch
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,CSV
Size: 255.96 MB
Thorsten-Voice Dataset 2022.10
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,CSV
Size: 1.30 GB
Thorsten-Voice Dataset 2021.06 Emotional
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV,CSV
Size: 380.80 MB
Daily Expressions in Highland Puebla Nahuatl
License: CC-BY-SA-4.0
Locale: azz
Task: NLP
Format: TSV
Size: 22.00 KB
Cuentos en Mam leídos en voz alta
License: CC-BY-SA-4.0
Locale: mam
Task: ASR
Format: MP3, TSV
Size: 110.28 MB
Cuentos en Kʼicheʼ leídos en voz alta
License: CC-BY-SA-4.0
Locale: quc
Task: ASR
Format: MP3. TSV
Size: 152.62 MB
CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes
License: CC-BY-NC-SA-4.0
Locale: cy
Task: NLP
Format: TXT, TSV
Size: 147.89 MB
Finance Sentences - North American Spanish
License: CC0-1.0
Locale: es-US
Task: NLP
Format: TSV, JSON
Size: 18.35 MB
Thorsten-Voice Dataset 2021.02
License: CC0-1.0
Locale: de-DE
Task: TTS
Format: WAV, CSV
Size: 2.55 GB
Persian VOA Corpus 2003-2008
License: Unlicense
Locale: fa
Task: NLP
Format: TXT
Size: 17.16 MB
Lingala-TTS-Dataset
License: NOODL-1.0
Locale: lin
Task: TTS
Format: WAV, TSV
Size: 962.04 MB
Polish Public Domain 20th Century Literature Text Corpus
License: CC0-1.0
Locale: pl
Task: NLP
Format: TXT
Size: 10.86 MB
Dolgan Folklore Text Corpus
License: CC0-1.0
Locale: dlg
Task: NLP
Format: TXT
Size: 57.16 KB
GeoLogicQA: An LLM Benchmark for Logical Reasoning in Georgian
License: CC-BY-NC-SA-4.0
Locale: ka
Task: LLM
Format: JSON
Size: 15.14 KB
Bojonegoro Javanese TTS
License: CC-BY-SA-4.0
Locale: jav
Task: TTS
Format: .tar.gz, WEBM
Size: 469.50 MB
ATLAS Cross-Lingual Transfer Matrix
License: Apache-2.0
Locale: en-US
Task: NLP
Format: CSV
Size: 2.36 KB
Zacatlán Tepetzintla Nahuatl ASR Dataset
License: CC-BY-ND-4.0
Locale: nhi
Task: ASR
Format: FLAC, TSV
Size: 789.98 MB
Kyrgyz Folklore Text Corpus
License: CC0-1.0
Locale: ky
Task: NLP
Format: TXT
Size: 1.28 MB
Finweb-Edu-Chinese-v2.2
License: Apache-2.0
Locale: zh
Task: LLM
Format: parquet
Size: 624.68 MB
Manggarai Language for NLP
License: CC-BY-NC-SA-4.0
Locale: mqy
Task: TTS
Format: WEBM, TSV
Size: 287.61 MB