How to use Fhrozen/voc_fastdiff_multilingual with ESPnet:
unknown model type (must be text-to-speech or automatic-speech-recognition)
No support given.
num_iters_per_epoch: 250 max_epoch: 1000 batch_size: 64 vocoder_conf: audio_channels: 1 inner_channels: 32 cond_channels: 80 upsample_ratios: - 5 - 5 - 4 - 3