Acoustic Scene Classification Based on Spatial Feature Extraction Using Convolutional Neural Networks
Acoustic scene classification (ASC) classifies the place or situation where an acoustic sound was recorded. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge prepared a task involving ASC. Some methods using convolutional neural networks (CNNs) were proposed in the DCASE 2017 Challenge. The best method independently performed convolution operations for the left, right, mid (addition of left and right channels), and side (subtraction of left and right channels) input channels to capture spatial features. On the other hand, we propose a new method of spatial feature extraction using CNNs. In the proposed method, convolutions are performed for the time-space (channel) domain and frequency-space domain in addition to the time-frequency domain to capture spatial features. We evaluate the effectiveness of the proposed method using the task in the DCASE 2017 Challenge. The experimental results confirmed that convolution operations for the frequency-space domain are effective for capturing spatial features. Furthermore, by using a combination of the three domains, the classification accuracy was improved by 2.19% compared with that obtained using the time-frequency domain only.
信号処理 22(4), 199-202, 2018