Tokotron: Tokenized TTS (lite version - minimal dependencies)#2849
Tokotron: Tokenized TTS (lite version - minimal dependencies)#2849flexthink wants to merge 11 commits intospeechbrain:developfrom
Conversation
pplantinga
left a comment
There was a problem hiding this comment.
Thanks for your work minimizing the dependencies, this is still quite large however and will take some time to review, I only finished one pass and didn't have time to look at everything. I will also have to go and try to run the LibriTTS and LJSpeech recipes.
Overall, the code quality looks quite good, if a little verbose for my taste -- e.g. I'm not sure if the Additive Embedding and Null Embedding and Embedding Injection are really needed or if something simpler could be done. And some of the docstrings have extra spaces that don't quite match overall SpeechBrain docstring style.
Anything you can do to simplify and keep only the parts that are really necessary will be a huge help for my review, as well as future users!
| The subfolder "fastspeech2" contains the recipes for training the non-autoregressive transformer based TTS model [FastSpeech2](https://arxiv.org/abs/2006.04558). | ||
|
|
||
| # Tokotron | ||
| The subfolder "tokotron" contains the recipes for training the transformer-based that uses discrete audio representations. |
There was a problem hiding this comment.
| The subfolder "tokotron" contains the recipes for training the transformer-based that uses discrete audio representations. | |
| The subfolder "tokotron" contains the recipes for training a transformer-based model that uses discrete audio representations. |
| compatibility | ||
| g2p_src : str | ||
| The source (HuggingFace hub or path) of the G2P model to | ||
| be used |
There was a problem hiding this comment.
Used for what? Under what circumstances?
| if model_name in ["Tacotron2", "FastSpeech2WithAlignment"]: | ||
| if extract_phonemes: | ||
| logger.info( | ||
| "Computing phonemes for LJSpeech labels using SpeechBrain G2P. This may take a while." |
There was a problem hiding this comment.
| "Computing phonemes for LJSpeech labels using SpeechBrain G2P. This may take a while." | |
| f"Using G2P {g2p_src} to convert LJSpeech labels to phonemes. This may take a while." |
| if model_name is not None and "FastSpeech2" in model_name: | ||
| if extract_phonemes: | ||
| logger.info( | ||
| "Computing pitch as required for FastSpeech2. This may take a while." |
There was a problem hiding this comment.
Is the pitch required for Tokotron as well? At the least this message should be updated.
| hubert: chaanks/hifigan-hubert-l1-3-7-12-18-23-LibriTTS | ||
| wav2vec: chaanks/hifigan-hubert-l1-3-7-12-18-23-LibriTTS |
There was a problem hiding this comment.
Are these supposed to be the same?
| ["out", "gate_out", "dec_self_attn", "dec_attn", "alignments", "context"], | ||
| ) | ||
|
|
||
| TokotronDecoderInfernceOutput = namedtuple( |
There was a problem hiding this comment.
Inference, not Infernce
| ], | ||
| ) | ||
|
|
||
| TokotronInfernceOutput = namedtuple( |
There was a problem hiding this comment.
Inference, not Infernce
| ) | ||
| nn.init.xavier_normal_(self.in_proj.w.weight) | ||
|
|
||
| """A simple embedding mechanism that adds the embedding to the inputs before the layer""" |
There was a problem hiding this comment.
| """A simple embedding mechanism that adds the embedding to the inputs before the layer""" |
| loss = super().fit_batch(batch) | ||
| if self.hparams.lr_annealing_mode == "step": | ||
| self.hparams.lr_annealing(self.optimizer) | ||
| return loss |
There was a problem hiding this comment.
| """Iterate epochs and datasets to improve objective. | ||
|
|
There was a problem hiding this comment.
Maybe instead of just copying the brain docstring this should state the changes that required overriding the default one.
|
Perhaps one thing we could do here is move the core changes to another PR: i.e. the four core (non-lobes) files in |
What does this PR do?
Introduces a simple TTS architecture based on discrete speech representations from self-supervised models
Related to #2696
This version omits
Before submitting
PR review
Reviewer checklist