Tacotron 2 #11

DarkDefender · 2017-12-20T12:17:56Z

Sorry if this is off-topic (deepvoice vs tacotron) but it seems like the tacotron 2 paper is now released.
The speech samples sounds better than ever (I think):
https://google.github.io/tacotron/publications/tacotron2/index.html

I must admit that I'm not too well versed in how much this differs from the original tacotron. But perhaps the changes made also could be used in your projects?

r9y9 · 2017-12-20T13:36:36Z

Paper (https://arxiv.org/abs/1712.05884) seems submitted to ICASSP 2018. I have read that today. It is very nice! I plan to implement WaveNet vocoder when I finish multi-speaker work (#10). DeepVoice3 and Tacotron2 both uses WaveNet vocoder.

DarkDefender · 2017-12-20T13:42:47Z

Great! I wish you luck with the multi-speak and vocoder work! :D

r9y9 · 2017-12-31T05:38:42Z

I started to implement WaveNet vocoder. It's still quite WIP, but I think I implemented all basic features. If you are interested, check out https://github.com/r9y9/wavenet_vocoder. Audio samples from a model trained on CMU Arctic (16kHz, ~1200 utterances) can be found at r9y9/wavenet_vocoder#1 (comment).

r9y9 · 2018-01-01T14:11:26Z

It turned out to be easy to implement WaveNet vocoder. I think my implementation is already feature complete. The problem is that I don't have 32 GPUs :(

DarkDefender · 2018-01-01T15:34:38Z

Yeah, it is really a bummer that the vocoder requires that much compute power to be able to train in a reasonable amount of time. :C

Perhaps you could try the WORLD vocoder method they used here? http://www.dtic.upf.edu/~mblaauw/NPSS/
However the quality will probably not be as good as the wavenet vocoder...

DarkDefender · 2018-01-07T16:32:16Z

@r9y9 I had a listen to https://r9y9.github.io/wavenet_vocoder/ and I think that they sound really quite good!

The samples are much better (to me at least) than the tacotron samples as they do not seem to have the same harsh "sound compression artifact" noise. They instead sound like they have lower quality microphones or recored on a lower quality analog tape. (I guess most of it has to do with the 16kHz sampling freq)

So anyways, what changed? Did you buy 32 GPUs or did I missunderstand that it is not the wavenet vocoder itself that needs that much compute power? (IE it is tacotron + wavenet vocoder that requires that much)

r9y9 · 2018-01-07T16:48:00Z

@DarkDefender Nothing changed:) I just trained WaveNets with my single GPU (GTX 1080Ti) . As I noted in the demo page, it took 22 hours to train for the single speaker version and 44 hours for the multi-speaker version. I used 1 ~ 7 hours of audio sampled at 16kHz. For larger and higher rate sampled data, it will take more time to train.

rraallvv · 2018-02-11T19:12:11Z

@r9y9 Wow! The output audio in the samples files is very impressive. If you don't mind I'd like to ask a couple of questions. Recently I was browsing some repos that do style transfer with deepvoice, in particular this one does a nice job, have your tried that kind of think? Also, do you know of an already trained network that I can run locally or online in a Jupyter notebook to generate speech from text?

Keep up the good work!

r9y9 · 2018-02-12T03:26:21Z

Hi, @rraallvv. I currently have lots of other things to do and have not tried it yet, sadly. WN for TTS is still WIP at #21. There is no pre-trained models at the moment.

rafaelvalle · 2018-02-12T03:52:05Z

We're very close of issuing a pull request with a implementation of Tacotron 2 that is compatible with @r9y9's repo.

rraallvv · 2018-02-12T14:32:26Z

@r9y9 thanks for your quick reply.

rafaelvalle · 2018-02-24T00:36:53Z

@r9y9 probably beginning of next week we'll issue a PR with Taco 2. Here's the attention and predicted mel after 7k iterations.

r9y9 · 2018-02-24T07:36:39Z

@rafaelvalle Great! I cannot wait next week:)

neverjoe · 2018-03-16T12:57:13Z

hi @rafaelvalle, i'am working on Taco 2 two, can u explain how to reprodurce your result which looks working?

rafaelvalle · 2018-03-16T16:47:02Z

@neverjoe hold on tight, we're very close to a release of Tacotron 2 with FP16 and Distributed.

neverjoe · 2018-03-16T18:15:41Z

great job!

PetrochukM · 2018-03-22T17:09:25Z

@rafaelvalle What does the timeline look like? Any samples you can share?

rafaelvalle · 2018-03-23T00:41:34Z

We will release it probably on Monday. I'll post a short sample today here!

…

On Thu, Mar 22, 2018, 10:09 AM Michael Petrochuk ***@***.***> wrote: @rafaelvalle <https://github.com/rafaelvalle> What does the timeline look like? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMij_qQ-KulTl_a_X1PTvwI7uDoh9JJks5tg9rHgaJpZM4RIT6B> .

neverjoe · 2018-03-23T00:57:50Z

great! Rafael Valle <notifications@github.com>于2018年3月23日周五上午8:41写道：

We will release it probably on Monday. I'll post samples today here! On Thu, Mar 22, 2018, 10:09 AM Michael Petrochuk ***@***.*** > wrote: > @rafaelvalle <https://github.com/rafaelvalle> What does the timeline look > like? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #11 (comment) >, > or mute the thread > < https://github.com/notifications/unsubscribe-auth/ACMij_qQ-KulTl_a_X1PTvwI7uDoh9JJks5tg9rHgaJpZM4RIT6B > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABsFmFzuFt7a-Ax9BSmXBmBCCXYV6rZxks5thES_gaJpZM4RIT6B> .

-- Sent by Inbox

rafaelvalle · 2018-03-23T02:09:13Z

Mel-spectrogram and alignment during FP16 and DistributedDataParallel training.

Short demo sample not in the training set generated with the Griffin-Lim algorithm, not Wavenet Decoder..
taco2_fp16_sample.aiff.zip

PetrochukM · 2018-03-23T04:48:24Z

@rafaelvalle What is required to get to samples that sound similar to Googles Tacotron 2? Do you think your getting close?

rafaelvalle · 2018-03-23T05:25:08Z

One can get to Google's quality by using the Wavenet decoder with at least 22khz sampling rate instead of Griffin-Lim. Ryuchi happens to have a repo with the Wavenet Decoder https://github.com/r9y9/wavenet_vocoder/

…

On Thu, Mar 22, 2018, 9:48 PM Michael Petrochuk ***@***.***> wrote: @rafaelvalle <https://github.com/rafaelvalle> What is required to get to samples that sound similar to Googles Tacotron 2? Do you think your getting close? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMij5WFcf4wQLH8rwA-P8XEv5mYwdMFks5thH6YgaJpZM4RIT6B> .

maozhiqiang · 2018-03-28T03:03:08Z

@rafaelvalle great job! How do you improve your work on https://github.com/Rayhane-mamah/Tacotron-2, would you to explain it? thank you

rafaelvalle · 2018-03-28T17:56:55Z

we'll release the code soon and everything will become evident. i'm sorry this is taking some time but we're going over many bureaucracy layers.

maozhiqiang · 2018-03-29T00:25:30Z

@rafaelvalle Thank you for your contributions ! I'm looking forward to seeing the performance of tacotron-2

duvtedudug · 2018-04-11T14:51:47Z

@rafaelvalle any more updates for your tacotron 2?!

rafaelvalle · 2018-04-14T03:42:20Z

Yeah, we decided to release tacotron and wavenet with real time inference. Still going through block bureaucratic layers

…

On Wed, Apr 11, 2018, 7:51 AM duvtedudug ***@***.***> wrote: @rafaelvalle <https://github.com/rafaelvalle> any more updates for your tacotron 2?! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACMij-1Z0im4r_H6qN3vxy7x-86XTscCks5tnhiEgaJpZM4RIT6B> .

duvtedudug · 2018-04-14T10:22:44Z

Sounds great @rafaelvalle !!!

Open source?

For real time inference are you using something like the RNN based 'Efficient Neural Audio Synthesis' by Kalchbrenner et al. ?

maozhiqiang · 2018-04-26T02:10:20Z

@rafaelvalle Do you implement parallel WaveNet！ to real time

rafaelvalle · 2018-04-26T15:31:31Z

We use the first wavenet for real-time inference.
Here's NVIDIA's "CUDA alien code" that makes wavenet run faster than real-time.
https://github.com/NVIDIA/nv-wavenet/

neverjoe · 2018-04-26T17:09:24Z

can u share your real time wavenet? 64 residual channels, 256 skip channels, 256 audio channels ? Rafael Valle <notifications@github.com>于2018年4月26日周四下午11:31写道：

We use the first wavenet for real-time inference. Here's NVIDIA's "CUDA alien code" that makes wavenet run faster than real-time. https://github.com/NVIDIA/nv-wavenet/ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABsFmLcSBIuKFgRcO0yA2xKa1R0vAJcPks5tsehVgaJpZM4RIT6B> .

-- Sent by Inbox

rafaelvalle · 2018-04-28T23:13:08Z

Keep an eye on it.
Soon there will be a full TTS stack with faster than real-time inference open sourced.

rafaelvalle · 2018-05-03T22:19:41Z

Here's the link to the PyTorch implementation of Tacotron 2.
https://github.com/NVIDIA/tacotron2

DarkDefender · 2018-05-04T05:38:11Z

@rafaelvalle Nice!
Do you have any speech samples that you can share?

rafaelvalle · 2018-05-04T20:40:29Z

Not yet but soon. We're currently focusing on another release.

PetrochukM · 2018-05-04T20:57:51Z

@rafaelvalle How much can we expect from Nvidia in upkeeping these repositories? The comments and tests are a bit lacking! Does not seem like there have been further updates to: https://github.com/NVIDIA/nv-wavenet

rafaelvalle · 2018-05-04T22:10:21Z

Hey @PetrochukM, please post any requests, issues, suggestion on the specific repos and the team responsible for it will address it precisely.

PetrochukM · 2018-05-04T22:23:14Z

@rafaelvalle Okay! Curious, how large of an effort is this? Assuming this is not a PyTorch like an effort by Facebook or a Tensorflow like effort.

rafaelvalle · 2018-05-05T21:51:45Z

You can probably find that information on the github repos as well...

r9y9 · 2018-05-09T16:28:58Z

r9y9/wavenet_vocoder#30 (comment)

Tacotron2 + WaveNet online TTS demo is comming soon.

finally...! ref: #30 ref: r9y9/deepvoice3_pytorch#11 ref: Rayhane-mamah/Tacotron-2#30 (comment)

r9y9 · 2018-05-10T16:50:46Z

Taco2 samples: https://r9y9.github.io/wavenet_vocoder/
On-line demo: https://colab.research.google.com/github/r9y9/Colaboratory/blob/master/Tacotron2_and_WaveNet_text_to_speech_demo.ipynb

I think I can finally close the issue.

r9y9 mentioned this issue Jan 6, 2018

WIP: Support for Wavenet vocoder #21

Open

5 tasks

r9y9 added a commit to r9y9/wavenet_vocoder that referenced this issue May 10, 2018

Add Tacotron2 demo samples

87008a3

finally...! ref: #30 ref: r9y9/deepvoice3_pytorch#11 ref: Rayhane-mamah/Tacotron-2#30 (comment)

r9y9 closed this as completed May 10, 2018

Tacotron 2 #11

Tacotron 2 #11

Comments

DarkDefender commented Dec 20, 2017 • edited Loading

r9y9 commented Dec 20, 2017

DarkDefender commented Dec 20, 2017

r9y9 commented Dec 31, 2017

r9y9 commented Jan 1, 2018

DarkDefender commented Jan 1, 2018

DarkDefender commented Jan 7, 2018 • edited Loading

r9y9 commented Jan 7, 2018

rraallvv commented Feb 11, 2018 • edited Loading

r9y9 commented Feb 12, 2018

rafaelvalle commented Feb 12, 2018 • edited Loading

rraallvv commented Feb 12, 2018

rafaelvalle commented Feb 24, 2018

r9y9 commented Feb 24, 2018

neverjoe commented Mar 16, 2018

rafaelvalle commented Mar 16, 2018

neverjoe commented Mar 16, 2018

PetrochukM commented Mar 22, 2018 • edited Loading

rafaelvalle commented Mar 23, 2018 via email • edited Loading

neverjoe commented Mar 23, 2018 via email

rafaelvalle commented Mar 23, 2018 • edited Loading

PetrochukM commented Mar 23, 2018

rafaelvalle commented Mar 23, 2018 via email • edited Loading

maozhiqiang commented Mar 28, 2018

rafaelvalle commented Mar 28, 2018

maozhiqiang commented Mar 29, 2018

duvtedudug commented Apr 11, 2018

rafaelvalle commented Apr 14, 2018 via email

duvtedudug commented Apr 14, 2018 • edited Loading

maozhiqiang commented Apr 26, 2018 • edited Loading

rafaelvalle commented Apr 26, 2018

neverjoe commented Apr 26, 2018 via email

rafaelvalle commented Apr 28, 2018

rafaelvalle commented May 3, 2018

DarkDefender commented May 4, 2018

rafaelvalle commented May 4, 2018

PetrochukM commented May 4, 2018

rafaelvalle commented May 4, 2018

PetrochukM commented May 4, 2018

rafaelvalle commented May 5, 2018

r9y9 commented May 9, 2018

r9y9 commented May 10, 2018

DarkDefender commented Dec 20, 2017 •

edited

Loading

DarkDefender commented Jan 7, 2018 •

edited

Loading

rraallvv commented Feb 11, 2018 •

edited

Loading

rafaelvalle commented Feb 12, 2018 •

edited

Loading

PetrochukM commented Mar 22, 2018 •

edited

Loading

rafaelvalle commented Mar 23, 2018 via email •

edited

Loading

rafaelvalle commented Mar 23, 2018 •

edited

Loading

rafaelvalle commented Mar 23, 2018 via email •

edited

Loading

duvtedudug commented Apr 14, 2018 •

edited

Loading

maozhiqiang commented Apr 26, 2018 •

edited

Loading