Griko-Italian Parallel Speech Corpus

330 utterances from a true documentation setting

About

This very small parallel speech corpus presents speech in the endangered language Griko and translations to Italian. It is made of 330 sentences, with the following information levels: speech, machine extracted pseudo-phones, transcriptions, translations and sentence alignment.

Downloading the data

  • Dataset: GitHub

Citing us

When using our dataset, please cite the following paper:

  @inproceedings{zanonboito18_sltu,
    title = {A Small Griko-Italian Speech Translation Corpus},
    author = {Marcely Zanon Boito and Antonios Anastasopoulos and Aline Villavicencio and Laurent Besacier and Marika Lekakou},
    year = {2018},
    booktitle = {6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)},
    pages = {36--41},
    doi = {10.21437/SLTU.2018-8},
  }