Datasets
Common Voice Spontaneous Speech 3.0 - Croatian
License: CC0-1.0
Locale: hr
Task: ASR
Format: MP3
Size: 285.11 KB
Common Voice Spontaneous Speech 3.0 - Danish
License: CC0-1.0
Locale: da
Task: ASR
Format: MP3
Size: 61.80 KB
Common Voice Spontaneous Speech 3.0 - Ruuli
License: CC0-1.0
Locale: ruc
Task: ASR
Format: MP3
Size: 365.95 MB
Common Voice Spontaneous Speech 3.0 - Irish
License: CC0-1.0
Locale: ga-IE
Task: ASR
Format: MP3
Size: 3.14 MB
Istorima
License: CC BY-NC-ND 4.0
Locale: gr-GR
Task: NLP
Format: PARQUET
Size: 416.02 MB
UP - DSP - Philippine Languages Database (UP-DSP-PLD)
License: CC-BY-NC-4.0
Locale: phi
Task: ASR
Format: WAV, LOG
Size: 45.63 GB
Urdu Multi-Speaker TTS Dataset
License: CC-BY-NC-4.0
Locale: urd
Task: TTS
Format: WEBM, TSV
Size: 514.54 MB
BECO Brahui Literature Corpus
License: CC-BY-NC-SA-4.0
Locale: brh
Task: NLP
Format: TXT
Size: 1.19 MB
Malayalam Time-Aligned Speech Corpus
License: CC-BY-NC-4.0
Locale: mal
Task: ASR
Format: WAV, SRT
Size: 1.50 GB
ddd-kenya-somali-68hrs-asr-part3
License: CC-BY-4.0
Locale: som
Task: ASR
Format: WAV, TSV
Size: 1.33 GB
ddd-kenya-somali-68hrs-asr-part2
License: CC-BY-4.0
Locale: som
Task: ASR
Format: WAV, TSV
Size: 8.07 GB
ddd-kenya-somali-68hrs-asr-part1
License: CC-BY-4.0
Locale: som
Task: ASR
Format: WAV, TSV
Size: 7.68 GB
TODa: Tamazight Open Dataset
License: CC-BY-4.0
Locale: zgh
Task: NLP
Format: CSV
Size: 3.27 MB
TTS Balinese Language
License: CC-BY-SA-4.0
Locale: ban
Task: TTS
Format: WEBM, TSV
Size: 301.05 MB
Kokoro Speech Dataset
License: libribox
Locale: ja
Task: TTS
Format: FLAC
Size: 3.98 GB
Sundanese TTS
License: CC-BY-SA-4.0
Locale: sun
Task: TTS
Format: WEBM, TSV
Size: 298.10 MB
Bangor Miami Spanish-English Corpus
License: GPL-3.0
Locale: es-US, en-US
Task: ASR
Format: MP3, CHA, TSV
Size: 1.12 GB
Elkhani Hazargi Literature Corpus
License: CC-BY-NC-4.0
Locale: haz
Task: NLP
Format: TXT
Size: 2.46 MB
Dari Literature Corpus by Anjuman e Adabi Nayestan
License: CC-BY-NC-4.0
Locale: prs
Task: NLP
Format: TXT
Size: 12.67 MB
IBT Torwali Wordlist
License: CC-BY-SA-4.0
Locale: trw
Task: NLP
Format: CSV
Size: 312.87 KB
Bangor Siarad Welsh-English Corpus
License: GPL-3.0
Locale: cym
Task: ASR
Format: MP3, CHA. TSV
Size: 2.13 GB
Bangor Patagonia Welsh-Spanish Corpus
License: GPL-3.0
Locale: cym, spa
Task: ASR
Format: MP3, CHA, TSV
Size: 988.02 MB
Saraiki-English Parallel Corpus
License: CC-BY-NC-4.0
Locale: mul
Task: MT
Format: CSV
Size: 1.92 MB
Jhoke Publisher Multan’s Saraiki Newspaper Corpus
License: CC-BY-NC-4.0
Locale: skr
Task: NLP
Format: TXT
Size: 2.30 MB