Audio and Visual Information Integration for Speaker's Localization in Automatic Shooting of Lecture

Bibliographic Information

Other Title
  • 講義自動撮影における話者位置推定のための視聴覚情報の統合
  • コウギ ジドウ サツエイ ニ オケル ワシャ イチ スイテイ ノ タメ ノ シチョウカク ジョウホウ ノ トウゴウ

Search this article

Abstract

It is useful for automatic video shooting in a lecture room to estimate the location of a speaker in the lecture room. The captured videos are used for distance learning and lecture archiving systems. In order to estimate the location of a speaker in a wide lecture room, multiple cameras and multiple microphones are used. However, it is difficult to estimate the precise location of a speaker using only visual or acoustic sensors because of calibration problems, noise, and other interference. Therefore, we propose a method that integrates audio and visual information from a speaker in the lecture room. A lecturer’s cell and a student’s cell ared introduced as a unit of estimation of the location of a speaker. We defined 120 cells in a real lecture room and our multi-modal method were applied to the cells. The estimation accuracy of the location of a speaker is sufficient for automatic video shooting of a speaker in a lecture room by our integrating method.

Journal

Citations (13)*help

See more

References(23)*help

See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top