Tamasheq-French Parallel Corpus
Tamasheq audio from radio broadcasts in Niger, aligned to French text translations
About
This annotated audio collection corresponds to 19 hours of radio broadcasts in Tamasheq. This data was collected by Avignon University in the context of the SELMA EU Project and ANR project ON-TRAC, and shared under the license CC BY-NC-ND-3.0. It was part of the IWSLT 2022 and 2023 low-resource speech translation track.
Downloading the data
Citing us
When using our dataset, please cite the following paper:
@inproceedings{zanon-boito-etal-2022-speech,
title = "Speech Resources in the {T}amasheq Language",
author = {Boito, Marcely Zanon and
Bougares, Fethi and
Barbier, Florentin and
Gahbiche, Souhir and
Barrault, Lo{\"i}c and
Rouvier, Mickael and
Est{\`e}ve, Yannick},
editor = "Calzolari, Nicoletta and
B{\'e}chet, Fr{\'e}d{\'e}ric and
Blache, Philippe and
Choukri, Khalid and
Cieri, Christopher and
Declerck, Thierry and
Goggi, Sara and
Isahara, Hitoshi and
Maegaard, Bente and
Mariani, Joseph and
Mazo, H{\'e}l{\`e}ne and
Odijk, Jan and
Piperidis, Stelios",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.lrec-1.222/",
pages = "2066--2071",
}