Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition
-
- Ueno Sei
- Graduate School of Informatics, Kyoto University
-
- Mimura Masato
- Graduate School of Informatics, Kyoto University
-
- Sakai Shinsuke
- Graduate School of Informatics, Kyoto University
-
- Kawahara Tatsuya
- Graduate School of Informatics, Kyoto University
Search this article
Abstract
<p>Sequence-to-sequence (seq2seq) automatic speech recognition (ASR) recently achieves state-of-the-art performance with fast decoding and a simple architecture. On the other hand, it requires a large amount of training data and cannot use text-only data for training. In our previous work, we proposed a method for applying text data to seq2seq ASR training by leveraging text-to-speech (TTS). However, we observe the log Mel-scale filterbank (lmfb) features produced by Tacotron 2-based model are blurry, particularly on the time dimension. This problem is mitigated by introducing the WaveNet vocoder to generate speech of better quality or spectrogram of better time-resolution. This makes it possible to train waveform-input end-to-end ASR. Here we use CNN filters and apply a masking method similar to SpecAugment. We compare the waveform-input model with two kinds of lmfb-input models: (1) lmfb features are directly generated by TTS, and (2) lmfb features are converted from the waveform generated by TTS. Experimental evaluations show the combination of waveform-output TTS and the waveform-input end-to-end ASR model outperforms the lmfb-input models in two domain adaptation settings.</p>
Journal
-
- Acoustical Science and Technology
-
Acoustical Science and Technology 42 (6), 333-343, 2021-11-01
ACOUSTICAL SOCIETY OF JAPAN
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1390852870562570112
-
- NII Article ID
- 130008110355
-
- NII Book ID
- AA11501808
-
- ISSN
- 13475177
- 03694232
- 13463969
-
- NDL BIB ID
- 031887296
-
- Text Lang
- en
-
- Data Source
-
- JaLC
- NDL
- Crossref
- CiNii Articles
- KAKEN
-
- Abstract License Flag
- Disallowed