Tracking and Modeling Focus of Attention in Meetings

Rainer Stiefelhagen

PhD Thesis
July 5th, 2002
Fakultät für Informatik
Universität Karlsruhe (TH)



Access the searchable online version: [ Online-Version]
Download full thesis as pdf-document: [PDF (3.4 Mb)]



Abstract

This thesis addresses the problem of tracking the focus of attention of people. In particular, a system to track the focus of attention of participants in meetings is developed. Obtaining knowledge about a person's focus of attention is an important step towards a better understanding of what people do, how and with what or whom they interact or to what they refer. In meetings, focus of attention can be used to disambiguate the addressees of speech acts, to analyze interaction and for indexing of meeting transcripts. Tracking a user's focus of attention also greatly contributes to the improvement of human-computer interfaces since it can be used to build interfaces and environments that become aware of what the user is paying attention to or with what or whom he is interacting.

The direction in which people look; i.e., their gaze, is closely related to their focus of attention. In this thesis, we estimate a subject's focus of attention based on his or her head orientation. While the direction in which someone looks is determined by head orientation and eye gaze, relevant literature suggests that head orientation alone is a sufficient cue for the detection of someone's direction of attention during social interaction. We present experimental results from a user study and from several recorded meetings that support this hypothesis.

We have developed a Bayesian approach to model at whom or what someone is looking based on his or her head orientation. To estimate head orientations in meetings, the participants' faces are automatically tracked in the view of a panoramic camera and neural networks are used to estimate their head orientations from pre-processed images of their faces. Using this approach, the focus of attention target of subjects could be correctly identified during 73% of the time in a number of evaluation meetings with four participants.

In addition, we have investigated whether a person's focus of attention can be predicted from other cues. Our results show that focus of attention is correlated to who is speaking in a meeting and that it is possible to predict a person's focus of attention based on the information of who is talking or was talking before a given moment. We have trained neural networks to predict at whom a person is looking, based on information about who was speaking. Using this approach we were able to predict who is looking at whom with 63% accuracy on the evaluation meetings using only information about who wash speaking. We show that by using both head orientation and speaker information to estimate a person's focus, the accuracy of focus detection can be improved compared to just using one of the modalities for focus estimation.

To demonstrate the generality of our approach, we have built a prototype system to demonstrate focus-aware interaction with a household robot and other smart appliances in a room using the developed components for focus of attention tracking. In the demonstration environment, a subject could interact with a simulated household robot, a speech-enabled VCR or with other people in the room, and the recipient of the subject's speech was disambiguated based on the user's direction of attention.



Access the searchable online version: [ Online-Version]
Download full thesis as pdf-document: [PDF (3.4 Mb)]