Computer Vision für Mensch-Maschine-Schnittstellen

Vortragssprache:

Deutsch

Beschreibung:

In dieser Vorlesung werden aktuelle Arbeiten aus dem Bereich der Bildverarbeitung vorgestellt, die sich mit der visuellen Perzeption von Personen für die Mensch-Maschine Interaktion befassen. In den einzelnen Themengebieten werden verschiedene Methoden und Algorithmen, deren Vor- und Nachteile, sowie der State of the Art diskutiert:
  • Lokalisierung und Erkennung von Gesichtern
  • Erkennung der Mimik (facial expressions)
  • Schätzen von Kopfdrehung und Blickrichtung
  • Lokalisation und Tracking von Personen
  • Tracking und Modellierung von Körpermodellen ('articulated body tracking')
  • Gestenerkennung
  • Audio-visuelle Spracherkennung
  • Multi-Kamera Umgebungen
  • Tools und Bibliotheken

Literaturhinweise:

Weiterführende Literatur

Wissenschaftliche Veröffentlichungen zum Thema, werden auf der VL-Website bereitgestellt.

Lehrinhalt:

  • Der Student soll einen Überblick über Themen des Maschinensehens (Computer Vision) für die Mensch-Maschine Interaktion bekommen.
  • Der Student soll grundlegende Konzepte aus dem Bereich Maschinensehen im Kontext der Mensch-Maschine Interaktion verstehen und anwenden lernen

Beschreibung

Derzeitige Mensch-Maschine Schnittstellen sind immer noch weitgehend "blind", was die Wahrnehmung Ihrer Benutzer betrifft. Sie sind daher weder in der Lage, die natürlichen menschlichen Kommunikationskanäle wie Mimik, Blickrichtung, Gestik, Körpersprache etc. für die Mensch-Maschine Interaktion zu nutzen, noch um ausreichendes Wissen über Ihre Nutzer, deren Zustand und Absichten zu gewinnen. Aktuelle Forschungsarbeiten beschäftigen sich damit dies zu verbessern und bessere Mensch-Maschine Schnittstellen zu entwickeln, welche ihre Benutzer und deren Handlungen wahrnehmen, und die gewonnene Kontextinformation dazu nutzen um angemessen mit den Nutzern zu interagieren.

In dieser Vorlesung werden aktuelle Arbeiten aus dem Bereich der Bildverarbeitung vorgestellt, die sich mit der visuellen Perzeption von Personen für die Mensch-Maschine Interaktion befassen. In den einzelnen Themengebieten werden verschiedene Methoden und Algorithmen, deren Vor- und Nachteile, sowie der Stand der Technik diskutiert.

Themen:

  • Lokalisierung und Erkennung von Gesichtern
  • Erkennung der Mimik (facial expressions)
  • Schätzen von Kopfdrehung und Blickrichtung
  • Lokalisation und Tracking von Personen
  • Tracking und Modellierung von Körpermodellen ("articulated body tracking")
  • Gestenerkennung
  • Audio-visuelle Spracherkennung
  • Multi-Kamera Umgebungen
  • Tools und Bibliotheken

Materialien

Ergänzende Literatur:

  • Face Detection
    • Phung et al., Skin Segmentation Using Color Pixel Classification: Analysis and Comparison, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 1, January 2005. [pdf]
    • Stan Birchfield, An Elliptical Head Tracker, 31st Asilomar Conference on Signals, Systems, and Computers, November 1997. [pdf]

    • Stan Birchfield, Elliptical Head Tracking Using Intensity Gradients and Color Histograms, IEEE Conference on Computer Vision and Pattern ecognition, Santa Barbara, California, June 1998. [pdf]

    • H. A. Rowley, S. Baluja, and T. Kanade, Neural Network-Based Face Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, January 1998. [pdf]

    • Paul Viola and Michael Jones, Rapid Object Detection Using a boosted cascade of simple features, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2001 [pdf]

    • Paul Viola and Michael Jones. Robust real-time object detection, Cambridge Research Laboratory, Technical Report, February 2001, CRL 2001/01 [pdf]

  • Face Recognition
    • M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Science, pp. 71-86, 1991.[pdf]

    • M. Turk and A. Pentland, Face Recognition Using Eigenfaces, CVPR, 1991. [pdf]

    • P.N. Belhumeur, J.P. Hespanha and D.J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE Trans. on PAMI, Vol. 19, No. 7, pp. 711-720, 1997. [pdf]

    • W. Zhao, R. Chellappa, P.J.J. Phillips, and A. Rosenfeld, Face Recognition: A Literature Survey, ACM Computing Survey, Vol. 35, No 4, 399-458, 2003. [pdf]

    • Volker Blanz, Thomas Vetter, Face Recognition Based on Fitting a 3D Morphable Model, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.25, No.9, September 2003. [pdf]

    • Hazim Kemal Ekenel, Rainer Stiefelhagen, Local Appearance Based Face Recognition Using Discrete Cosine Transform, 13th European Signal Processing Conference (EUSIPCO), Antalya, Turkey, September 2005. [pdf]

  • People Detection
    • Person Detection I

      • N. Dalal, B. Triggs, Histogram Of Oriented Gradients for Human Detection, CVPR 2005 [pdf].
      • N. Dalal, B. Triggs, C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, ECCV 2006 [pdf].
      • D. Gavrila, Multi-feature Hierarchical Template Matching Using Distance Transforms, ICPR 1998 [pdf].
      • D. Gavrila, Real-Time Object Detection for Smart Vehicles, ICCV 1999 [pdf].
      • D. Gavrila (2000), Pedestrian Detection from a Moving Vehicle, ECCV 2000 [pdf].
    • Person Detection II

      • A. Mohan, C. Papageorgiu, T. Poggio, Example-Based Object Detection in Images by Componentes, PAMI 2001 [pdf].
      • K. Mikolajczyk, C. Schmid, A Performance Evaluation of Local Descriptors, PAMI 2005 [pdf].
      • E. Seemann, B. Leibe, K. Mikolajczyk, B. Schiele, An Evaluation of Local Shape-Based Features for Pedestrian Detection, BMVC 2005 [pdf].
      • B. Leibe, A. Leonardis, B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV 2004 [pdf].
      • B. Leibe, A. Leonardis, B. Schiele, Robust Object Detection with Interleaved Categorization and Segmentation, IJCV [pdf].
    • Person Detection III

      • B. Leibe, E. Seemann, B. Schiele, Pedestrian Detection in Crowded Scenes, CVPR 2005 [pdf].
      • E. Seemann, B. Leibe, B. Schiele, Multi-Aspect Detection of Articulated Objects, CVPR 2006 [pdf].
      • E. Seemann, B. Schiele, Cross-Articulation Learning for Robust Detection of Pedestrians, DAGM 2006 [pdf].
      • L. Bourdev, J. Malik, Poselets: Body Part Detectors trained using 3D Human Pose Annotations, ICCV 2009 [pdf].
      • L. Bourdev, S. Maji, T. Brox, J. Malik, Detecting People using Mutually Consistent Poselet Activations, ECCV 2010 [pdf].
      • P. Dollar, C. Wojek, B. Schiele, P. Perona, Pedestrian Detection: An Evaluation of the State of the Art, PAMI 2011 [pdf].
    • Person Re-Identification

      • L. Bourdev, S. Maji, J. Malik, Describing People: A Poselet-Based Approach to Attribute Classification, ICCV 2011 [pdf].
      • G. Doretto, T. Sebastian, P. Tu, J. Rittscher, Appearance-based Person Reidentification in Camera Networks: Problem Overview and Current Approaches, AIHC 2011 [pdf].
  • Head Pose and Focus of Attention

    • Rainer Stiefelhagen, Jie Yang, Alex Waibel, A Modelbased Gaze Tracking System, Proc. of IEEE International Joint Symposia on Intelligence and Systems, pp. 304-310, Rockville Maryland, November 1996. [pdf]

    • Rainer Stiefelhagen, Jie Yang, Alex Waibel, Modeling Focus of Attention for Meeting Indexing based on Multiple Cues, IEEE Transactions on Neural Networks, July 2002, Vol. 13, Number 4, pp. 928-938. [pdf]

    • M.Katzenmaier, R.Stiefelhagen, T.Schultz, I.Rogina, A.Waibel, Identifying the Addressee in Human-Human-Robot Interactions based on Head Pose and Speech, International Conference on Multimodal Interfaces ICMI 2004, State College, PA, USA, October 2004. [pdf]
  • Facial Features

    • A. L. Yuille, D. S. Cohen, P. W. Hallinan, Feature extraction from faces using deformable templates, Computer Vision Pattern Recognition , 1989. [pdf]
    • T. F. Cootes, C. J. Taylor, Statistical Models of Appearance for Computer Vision, Technical Report (draft), University of Manchester. [pdf]
    • T. F. Cootes, G. J. Edwards, C. J. Taylor, Active Appearance Models, European Conf. on Computer Vision, Vol. 2, pp. 484-498, Springer 1998. [pdf]
  • Facial Expression

    • Y. Tian, T. Kanade, J. Cohn, Facial Expression Analysis, Handbookof face recognition, S.Z. Li & A.K. Jain, ed., Springer, Oct. 2003. [pdf]

    • Y. Tian, T. Kanade, J. Cohn, Recognizing Action Units for Facial Expression Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence,Vol. 23, No. 2, pp. 97-115, Feb. 2001. [pdf]

    • M.S. Bartlett, B. Braathen, G. Littlewort-Ford, J. Hershey, I. Fasel, T. Marks, E. Smith, T.J. Sejnowski, J.R. Movellan, Automatic Analysis of Spontaneous Facial Behavior, Tech. Report, USCD MPLabTR 2001.08, Oct. 2001. [pdf]
  • Gesture Recognition
    • T. Starner, J. Weaver, A. Pentland: Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1371--1375, 1998. [pdf]
    • Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE, 77 (2), 257-286, 1989. [pdf]
    • Nickel, K., Stiefelhagen, R.: 3D-Tracking of Heads and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario, Sixth Int. Conf. On Face and Gesture Recognition, May 2004, Seoul, Korea. [pdf]
  • Tracking
    • Kai Nickel, Tobias Gehrig, Rainer Stiefelhagen, John McDonough, A Joint Particle Filter for Audio-visual Speaker Tracking, International Conference on Multimodal Interfaces ICMI 05, Trento, Italy, October 2005. [pdf]
    • Stauffer, Grimson, Adaptive Background Mixture Models for Real-time Tracking. CVPR 1998. [pdf]
    • M. Isard and A. Blake, Condensation - conditional density propagation for visual tracking, International Journal of Computer Vision 29(1), pp. 5-28, 1998. [pdf]
    • Mun Wai Lee, Isaac Cohen and Soon Ki Jung, Particle Filter with Analytical Inference for Human Body Tracking, Institute for Robotics and Intelligent Systems, Integrated Media Systems Center, University of South California, 2002. [pdf]
    • D. Focken, R. Stiefelhagen, Towards Vision-based 3-D People Tracking in a Smart Room, IEEE International Conference on Multimodal Interfaces, Pittsburgh, PA, USA, October 14-16, 2002, pp. 400-405. [pdf]
    • K. Nickel, R. Stiefelhagen, 3D-Tracking of Heads and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario, Sixth Int. Conf. On Face and Gesture Recognition, May 2004, Seoul, Korea. [pdf]
  • Activity Recognition
    • F. Bobick, J. Davis, The Recognition of Human Movement Using Temporal Templates, IEEE PAMI, Vol. 23, No. 3, March 2001. [pdf]
    • Laptev and P. Perez. Retrieving actions in movies. ICCV '07. [pdf]
    • K. Nickel et al., "Activity Recognition" (book chapter), In "Computers in the Human Interaction Loop", A. Waibel & R. Stiefelhagen (Eds.), Springer 2009 [pdf]
    • N. Oliver, E. Horvitz, A. Garg, Layered Representations for Human Activity Recognition, Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI). [pdf]
    • D. Demirdjian, K. Tollmar, K. Koile, N. Checka, T. Darrell, Activity Maps for Location-Aware Computing, Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision, 2002. [pdf]
    • C. Wojek, K. Nickel, R. Stiefelhagen, Activity Recognition and Room Level Tracking in an Office Environment , IEEE Int. Conference on Multisensor Fusion and Integration for Intelligent Systems - MFI06, September 2006. [pdf]
  • Audo Visual Speech Recognition
    • Gerasimos Potamianos, Chalapathy Neti, Juergen Luettin, Iain Matthews, Audio-Visual Automatic Speech Recognition: An Overview, Issues in Visual and Audio-Visual Speech Processing, G. Bailly, E. Vatikiotis-Bateson, and P. Perrier, Eds., MIT Press, 2004. [pdf]