mHuBERT-147
The mHuBERT-147 SSL model family
About
The mHuBERT-147 models are multilingual compact and efficient self-supervised speech representation models. They were trained and evaluated at NAVER LABS Europe with the funding of the UTTER EU project. They are shared under the license CC-BY-NC-SA-4.0. For more information, check our publication here.
Link to resources
-
The mHuBERT-147 pre-trained models collection:
- Intermediate checkpoints from the 3rd iteration: LINK (user: user /password: copy the license mentioned above)
-
The training code:
-
Pre-processing and clustering scripts:
-
HUTTER, a mHuBERT-147 CommonVoice Prototype:
Citing us
For citing us, please use the bibtex below: @inproceedings{zanonboito24_interspeech,
title = {m{H}u{BERT}-147: A Compact Multilingual {H}u{BERT} Model},
author = {Marcely Zanon Boito and Vivek Iyer and Nikolaos Lagos and Laurent Besacier and Ioan Calapodescu},
year = {2024},
booktitle = {Interspeech 2024},
pages = {3939--3943},
doi = {10.21437/Interspeech.2024-938},
issn = {2958-1796},
}