Marcely Zanon Boito | mHuBERT-147

About

The mHuBERT-147 models are multilingual compact and efficient self-supervised speech representation models. They were trained and evaluated at NAVER LABS Europe with the funding of the UTTER EU project. They are shared under the license CC-BY-NC-SA-4.0. For more information, check our publication here.

Link to resources

The mHuBERT-147 pre-trained models collection:
Intermediate checkpoints from the 3rd iteration: LINK (user: user /password: copy the license mentioned above)
The training code:
Pre-processing and clustering scripts:
HUTTER, a mHuBERT-147 CommonVoice Prototype:

Citing us

For citing us, please use the bibtex below:

    @inproceedings{zanonboito24_interspeech,
      title     = {m{H}u{BERT}-147: A Compact Multilingual {H}u{BERT} Model},
      author    = {Marcely Zanon Boito and Vivek Iyer and Nikolaos Lagos and Laurent Besacier and Ioan Calapodescu},
      year      = {2024},
      booktitle = {Interspeech 2024},
      pages     = {3939--3943},
      doi       = {10.21437/Interspeech.2024-938},
      issn      = {2958-1796},
    }