Publications

Publications in reversed chronological order including International Conferences, International Workshops, Journals and Local Conferences and Workshops.

2025

  1. EMNLP Findings preprint
    From Tower to Spire: Adding the speech modality to a translation specialist LLM
    Ambilduke, Kshitij, Peters, Ben, Sannigrahi, Sonal, Keshwani, Anil, Lam, Tsz Kin, Martins, Bruno, Martins, André FT, and Boito, Marcely Zanon
    arXiv preprint arXiv:2503.10620 2025
  2. IWSLT Track Winner
    NAVER LABS Europe Submission to the Instruction-following Track
    Lee, Beomseok, Boito, Marcely Zanon, Besacier, Laurent, and Calapodescu, Ioan
    In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT) 2025

2024

  1. mHuBERT-147: A Compact Multilingual HuBERT Model
    Boito, Marcely Zanon, Iyer, Vivek, Lagos, Nikolaos, Besacier, Laurent, and Calapodescu, Ioan
    In Interspeech 2024
  2. Multilingual Distilwhisper: Efficient Distillation of Multi-Task Speech Models Via Language-Specific Experts
    Ferraz, Thomas Palmeira, Boito, Marcely Zanon, Brun, Caroline, and Nikoulina, Vassilina
    In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
  3. LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech
    Parcollet, Titouan, Nguyen, Ha, Evain, Solène, Boito, Marcely Zanon, Pupier, Adrien, Mdhaffar, Salima, Le, Hang, Alisamir, Sina, Tomashenko, Natalia, Dinarelli, Marco, Zhang, Shucong, Allauzen, Alexandre, Coavoux, Maximin, Estève, Yannick, Rouvier, Mickael, Goulian, Jerôme, Lecouteux, Benjamin, Portet, François, Rossato, Solange, Ringeval, Fabien, Schwab, Didier, and Besacier, Laurent
    Computer Speech & Language 2024

2023

  1. IWSLT Track Winner
    NAVER LABS Europe’s Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track
    Gow-Smith, Edward, Berard, Alexandre, Boito, Marcely Zanon, and Calapodescu, Ioan
    In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT) 2023

2022

  1. A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems
    Boito, Marcely Zanon, Besacier, Laurent, Tomashenko, Natalia, and Estéve, Yannick
    In Interspeech 2022
  2. Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings
    Boito, Marcely Zanon, Yusuf, Bolaji, Ondel, Lucas, Villavicencio, Aline, and Besacier, Laurent
    In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages 2022
  3. Speech Resources in the Tamasheq Language
    Boito, Marcely Zanon, Bougares, Fethi, Barbier, Florentin, Gahbiche, Souhir, Barrault, Loïc, Rouvier, Mickael, and Estève, Yannick
    In Proceedings of the Thirteenth Language Resources and Evaluation Conference 2022
  4. IWSLT Track Winner
    ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks
    Boito, Marcely Zanon, Ortega, John, Riguidel, Hugo, Laurent, Antoine, Barrault, Loïc, Bougares, Fethi, Chaabani, Firas, Nguyen, Ha, Barbier, Florentin, Gahbiche, Souhir, and Estève, Yannick
    In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT) 2022
  5. Findings of the IWSLT 2022 Evaluation Campaign
    Anastasopoulos, Antonios, Barrault, Loïc, Bentivogli, Luisa, Boito, Marcely Zanon, Bojar, Ondřej, Cattoni, Roldano, Currey, Anna, Dinu, Georgiana, Duh, Kevin, Elbayad, Maha, Emmanuel, Clara, Estève, Yannick, Federico, Marcello, Federmann, Christian, Gahbiche, Souhir, Gong, Hongyu, Grundkiewicz, Roman, Haddow, Barry, Hsu, Benjamin, Javorský, Dávid, Kloudová, Vĕra, Lakew, Surafel, Ma, Xutai, Mathur, Prashant, McNamee, Paul, Murray, Kenton, Nǎdejde, Maria, Nakamura, Satoshi, Negri, Matteo, Niehues, Jan, Niu, Xing, Ortega, John, Pino, Juan, Salesky, Elizabeth, Shi, Jiatong, Sperber, Matthias, Stüker, Sebastian, Sudoh, Katsuhito, Turchi, Marco, Virkar, Yogesh, Waibel, Alexander, Wang, Changhan, and Watanabe, Shinji
    In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT) 2022
  6. Promises and Limitations of Self-supervised Learning for Automatic Speech Processing
    Maison, Lucas, Boito, Marcely Zanon, and Estève, Yannick
    In Actes de la 4ème Conference on Artificial Intelligence for Defense (CAID) 2022
  7. JEP French
    LeBenchmark, un référentiel d’évaluation pour le français oral
    Le, Hang, Alisamir, Sina, Dinarelli, Marco, Ringeval, Fabien, Evain, Solène, Nguyen, Ha, Boito, Marcely Zanon, Mdhaffar, Salima, Tong, Ziyi, Tomashenko, Natalia, and others,
    In 34e Journées d’étude sur la parole 2022
  8. JEP French
    Modèles neuronaux pré-appris par auto-supervision sur des enregistrements de parole en français
    Evain, Solène, Nguyen, Ha, Le, Hang, Boito, Marcely Zanon, Mdhaffar, Salima, Alisamir, Sina, Tong, Ziyi, Tomashenko, Natalia, Dinarelli, Marco, Parcollet, Titouan, and others,
    In 34e Journées d’étude sur la parole 2022

2021

  1. NeurIPS Benchmarks Track
    Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark
    Evain, Solène, Nguyen, Ha, Le, Hang, Zanon Boito, Marcely, Mdhaffar, Salima, Alisamir, Sina, Tong, Ziyi, Tomashenko, Natalia, Dinarelli, Marco, Parcollet, Titouan, Allauzen, Alexandre, Estève, Yannick, Lecouteux, Benjamin, Portet, François, Rossato, Solange, Ringeval, Fabien, Schwab, Didier, and besacier, laurent
    In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2021
  2. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
    Evain, Solène, Nguyen, Ha, Le, Hang, Boito, Marcely Zanon, Mdhaffar, Salima, Alisamir, Sina, Tong, Ziyi, Tomashenko, Natalia, Dinarelli, Marco, Parcollet, Titouan, Allauzen, Alexandre, Estève, Yannick, Lecouteux, Benjamin, Portet, François, Rossato, Solange, Ringeval, Fabien, Schwab, Didier, and Besacier, Laurent
    In Interspeech 2021 2021

2020

  1. Investigating alignment interpretability for low-resource NMT
    Boito, Marcely Zanon, Villavicencio, Aline, and Besacier, Laurent
    Machine Translation 2020
  2. MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
    Boito, Marcely Zanon, Havard, William, Garnerin, Mahault, Le Ferrand, Éric, and Besacier, Laurent
    In Proceedings of the Twelfth Language Resources and Evaluation Conference 2020
  3. Investigating Language Impact in Bilingual Approaches for Computational Language Documentation
    Boito, Marcely Zanon, Villavicencio, Aline, and Besacier, Laurent
    In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) 2020

2019

  1. Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-Resource Settings
    Boito, Marcely Zanon, Villavicencio, Aline, and Besacier, Laurent
    In Interspeech 2019 2019
  2. How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages
    Boito, Marcely Zanon, Villavicencio, Aline, and Besacier, Laurent
    In Journées Scientifiques du Groupement de Recherche: Linguistique Informatique, Formelle et de Terrain (LIFT). 2019
  3. ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task
    Nguyen, Ha, Tomashenko, Natalia, Boito, Marcely Zanon, Caubriére, Antoine, Bougares, Fethi, Rouvier, Mickael, Besacier, Laurent, and Estéve, Yannick
    In Proceedings of the 16th International Conference on Spoken Language Translation 2019

2018

  1. A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
    Godard, Pierre, Adda, Gilles, Adda-Decker, Martine, Benjumea, Juan, Besacier, Laurent, Cooper-Leavitt, Jamison, Kouarata, Guy-Noel, Lamel, Lori, Maynard, Hélène, Mueller, Markus, Rialland, Annie, Stueker, Sebastian, Yvon, François, and Boito, Marcely Zanon
    In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) 2018
  2. A Small Griko-Italian Speech Translation Corpus
    Boito, Marcely Zanon, Anastasopoulos, Antonios, Villavicencio, Aline, Besacier, Laurent, and Lekakou, Marika
    In 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018) 2018
  3. Unsupervised Word Segmentation from Speech with Attention
    Godard, Pierre, Boito, Marcely Zanon, Ondel, Lucas, Berard, Alexandre, Yvon, François, Villavicencio, Aline, and Besacier, Laurent
    In Interspeech 2018 2018

2017

  1. Unwritten languages demand attention too! Word discovery with encoder-decoder models
    Boito, Marcely Zanon, Bérard, Alexandre, Villavicencio, Aline, and Besacier, Laurent
    In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017

2014

  1. Size does not matter. Frequency does. A study of features for measuring lexical complexity
    Wilkers, Rodrigo, Vecchia, Alessandro Dalla, Boito, Marcely Zanon, Padró, Muntsa, and Villavicencio, Aline
    In Ibero-American conference on artificial intelligence 2014
  2. TorPorEsp Portuguese
    Uma análise do perfil de entropia das estruturas sintáticas do português
    Boito, Marcely Zanon, Hagemann, Luiza, Wilkens, Rodrigo, and Villavicencio, Aline
    In ToRPorEsp workshop, BDBComp (Biblioteca Digital Brasileira de Computação) 2014