Tamasheq-French Parallel Corpus

Tamasheq audio from radio broadcasts in Niger, aligned to French text translations

About

This annotated audio collection corresponds to 19 hours of radio broadcasts in Tamasheq. This data was collected by Avignon University in the context of the SELMA EU Project and ANR project ON-TRAC, and shared under the license CC BY-NC-ND-3.0. It was part of the IWSLT 2022 and 2023 low-resource speech translation track.

Downloading the data

  • Dataset: GitHub

Citing us

When using our dataset, please cite the following paper:

@inproceedings{zanon-boito-etal-2022-speech,
    title = "Speech Resources in the {T}amasheq Language",
    author = {Boito, Marcely Zanon  and
      Bougares, Fethi  and
      Barbier, Florentin  and
      Gahbiche, Souhir  and
      Barrault, Lo{\"i}c  and
      Rouvier, Mickael  and
      Est{\`e}ve, Yannick},
    editor = "Calzolari, Nicoletta  and
      B{\'e}chet, Fr{\'e}d{\'e}ric  and
      Blache, Philippe  and
      Choukri, Khalid  and
      Cieri, Christopher  and
      Declerck, Thierry  and
      Goggi, Sara  and
      Isahara, Hitoshi  and
      Maegaard, Bente  and
      Mariani, Joseph  and
      Mazo, H{\'e}l{\`e}ne  and
      Odijk, Jan  and
      Piperidis, Stelios",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.222/",
    pages = "2066--2071",
}
Creative Commons License