Open Domain Continuous Filipino Speech Recognition: Challenges and Baseline Experiments
-
- ANG Federico
- DSP Laboratory, University of the Philippines
-
- GUEVARA Rowena Cristina
- DSP Laboratory, University of the Philippines
-
- MIYANAGA Yoshikazu
- ICN Laboratory, Hokkaido University
-
- CAJOTE Rhandley
- DSP Laboratory, University of the Philippines
-
- ILAO Joel
- DSP Laboratory, University of the Philippines
-
- BAYONA Michael Gringo Angelo
- DSP Laboratory, University of the Philippines
-
- LAGUNA Ann Franchesca
- DSP Laboratory, University of the Philippines
Abstract
In this paper, a new database suitable for HMM-based automatic Filipino speech recognition is described for the purpose of training a domain-independent, large-vocabulary continuous speech recognition system. Although it is known that high-performance speech recognition systems depend on a superior speech database used in the training stage, due to the lack of such an appropriate database, previous reports on Filipino speech recognition had to contend with serious data sparsity issues. In this paper we alleviate such sparsity through appropriate data analysis that makes the evaluation results more reliable. The best system is identified through its low word-error rate to a cross-validation set containing almost three hours of unknown speech data. Language-dependent problems are discussed, and their impact on accuracy was analyzed. The approach is currently data driven, however it serves as a competent baseline model for succeeding future developments.
Journal
-
- IEICE Transactions on Information and Systems
-
IEICE Transactions on Information and Systems E97.D (9), 2443-2452, 2014
The Institute of Electronics, Information and Communication Engineers
- Tweet
Details 詳細情報について
-
- CRID
- 1390282679356377984
-
- NII Article ID
- 130004685452
-
- ISSN
- 17451361
- 09168532
-
- Text Lang
- en
-
- Data Source
-
- JaLC
- Crossref
- CiNii Articles
- KAKEN
-
- Abstract License Flag
- Disallowed