Mada Narratives
License:
NOODL-1.0
Steward:
Institute of African Digital Humanities
Task: NLP
Release Date: 1/5/2026
Format: TXT
Size: 65.04 KB
Description
This dataset contains 17 transcribed oral narratives in Mada (mxu), a language belonging to the Afro-Asiatic family that is spoken in Cameroon. The texts, derived from audio recordings of oral literature, reflect natural spoken discourse. This dataset can be used for language modelling, text analysis and other natural language processing (NLP) tasks.
Specifics
Licensing
Nwulite Obodo Open Data Licence 1.0 (NOODL-1.0)
https://licensingafricandatasets.com/nwulite-obodo-licenseConsiderations
Restrictions/Special Constraints
- For research and scientific use only - You agree that you will not re-host or re-share this dataset
Forbidden Usage
You agree not to use the data for: Generative AI; reproduction; duplication; modification; augmentation; copying; distribution; transmission; display; sale; transfer; publication or creation of derivative works without the explicit permission of the the legal owner of the dataset.
Processes
Intended Use
This dataset is intended for various NLP tasks such as language modelling, text analysis, part-of-speech tagging, sentiment analysis and other related natural language processing (NLP) tasks.
Metadata
Language
Maɗa (mxu) should not be confused with Mada (mda). The former is an Afro-Asiatic language spoken in Cameroon, while the latter is an Atlantic-Congo language spoken in Nigeria (see Ethnologue and Glottolog online). This dataset focuses on Mada (mxu), a Chadic language belonging to the Afro-Asiatic family, which is spoken in Cameroon's Far North Region, specifically in the Mayo-Sava Division and Tokombere Subdivision. It is believed that the Mada-speaking group formerly belonged to the Wandala (or Mandara) kingdom alongside a number of other groups, including the Wuzlam, Mayan, Melokwo, Zelgwa-Gemzek, Zulgo-Gemzek and Gudawa.
Variants
We were unable to find specific information on the sociolinguistic and dialectal situation of Maɗa while preparing this dataset for publication. According to Glottolog Online, Maɗa belongs to the Madam group, which also includes the Muyang and Wuzlam languages.
Alphabet
1. Vowels
a, e, i, o, u (occasionally long vowels through reduplication in discourse, not orthographically marked)
2. Consonants
b, c, d, f, g, h, j, k, l, m, n, p, r, s, t, v, w, y, z
3. Digraphs and Consonant Clusters
ch, ck, dz, gb, gw, kp, kw, mb, nd, ng, nj, nz, tl, vr
Source
The texts in this dataset were created around the 1960s and 1970s. The texts are transcriptions of literary genres performed orally and prompted by missionaries. It is unclear whether these texts were recorded on tape or if the transcriptions were done on-site by the collectors. The texts were further edited and revised by Hubert Nkoumou in the 2010s, when he was working as a consultant at the local Mada language academy.
Domain
The texts are narratives that deal with a variety of topics, such as procreation, marriage, household life and social life, as well as the supernatural.
Size
Total size is 65,04 KB
Structure
This dataset comprises 18 texts each of which tells a distinct story. The texts are of various length, totalling 23260 tokens.
Sample
ohnzonzyof aka ala-va kal. aha la "kam-kam mohnzonzyof, aha wa awala henné té aha la : mbrena gokwa ane yo ?
A, nok kahpada fana jan gokwa, gata kuné wen gwal dya dwal oplo fegar nuke-da, ta hawayo ? "aza ahapa ahpa, ahpa,ahpa, ahpa…
Vad ma aha la… kwojana man embéde-va man meckwer, ana uro agaba a wala va. Afalaña wellé akehyé mahnzow
Aza-kabara gogom arav gana, efé erba nuke da, otad-otad, ahala-ba ahal eré va.
Tondodo fo (toholo fo) ergwat gwala ta hawey ?
Atala ata man turo. Afalaña toduro ma, tégyé kal aka sada ana tamcala-ra
Adara eké eré, ana dazan ambafana tam, uné-beré dazana ; awa tedex esetalana atal yana tura-va elgwa ten-ten cece
Tedde-léberé-ava ata wal gana mbeez teké musana a mévédenga
mbacaka noana néhé. Tenan ma, nawa gam ara sek gwala vada.
Vad ma wal néhé ocoluro zégla, okwar néhé ambada-ra, ana abaz mbeez nukeda ; mbeez unen-va ftek ellek ana ekéré jgwa.
