Acoustic Scene Classification Based on Spatial Feature Extraction Using Convolutional Neural Networks

この論文にアクセスする

著者

    • Takahashi Gen
    • Graduate School of Systems and Information Engineering, University of Tsukuba
    • Yamada Takeshi
    • Graduate School of Systems and Information Engineering, University of Tsukuba
    • Makino Shoji
    • Graduate School of Systems and Information Engineering, University of Tsukuba

抄録

Acoustic scene classification (ASC) classifies the place or situation where an acoustic sound was recorded. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge prepared a task involving ASC. Some methods using convolutional neural networks (CNNs) were proposed in the DCASE 2017 Challenge. The best method independently performed convolution operations for the left, right, mid (addition of left and right channels), and side (subtraction of left and right channels) input channels to capture spatial features. On the other hand, we propose a new method of spatial feature extraction using CNNs. In the proposed method, convolutions are performed for the time-space (channel) domain and frequency-space domain in addition to the time-frequency domain to capture spatial features. We evaluate the effectiveness of the proposed method using the task in the DCASE 2017 Challenge. The experimental results confirmed that convolution operations for the frequency-space domain are effective for capturing spatial features. Furthermore, by using a combination of the three domains, the classification accuracy was improved by 2.19% compared with that obtained using the time-frequency domain only.

収録刊行物

  • 信号処理

    信号処理 22(4), 199-202, 2018

    信号処理学会

各種コード

  • NII論文ID(NAID)
    130007418577
  • 本文言語コード
    ENG
  • ISSN
    1342-6230
  • データ提供元
    J-STAGE 
ページトップへ