Quantifying Discussion by Using Camera as a Smart Sensor

The emergence of various types of commercial cameras (compact, high resolution, high angle of view, high speed, high dynamic range, etc.) has contributed significantly to the understanding of human activities. By taking advantage of the characteristic of a high angle of view, we demonstrate a system that recognizes micro-behaviors in a discussion with a single 360 degree camera towards quantified meeting analysis.

In our first paper, we propose a method that recognizes speaking and nodding, which have often been overlooked in existing research, from a video stream of face images and a Random Forest classifier. The proposed approach was evaluated on our three datasets. To create the first and the second dataset, we asked participants to meet physically: 16 sets of five minutes data from 21 unique participants and seven sets of 10 minutes meeting data from 12 unique participants. The experimental results showed that our approach could detect speaking and nodding with a macro average f1-score of 67.9 % in 10-fold random split cross-validation and a macro average f1-score of 62.5 % in a leave-one-participant-out cross-validation. By considering the increased demand for an online meeting due to the COVID-19 pandemic, we also record faces on a screen that are captured by web cameras as the third dataset and discussed the potential and challenges of applying our idea to virtual video conferences.

Related Publication

  1. Ko Watanabe, Yusuke Soneda, Yuki Matsuda, Yugo Nakamura, Yutaka Arakawa, Andreas Dengel and Shoya Ishimaru. “DisCaaS: Micro Behavior Analysis on Discussion by Camera as a Sensor”. Sensors 21 (17), p. 5719, 2021.