Personal Information
|
|
Office address:
Boris Schauerte
Karlsruhe Institute of Technology
Institute for Anthropomatics
Adenauerring 2 (Campus South)
76131 Karlsruhe
Room: 229 (Building 50.20)
Phone: +49 (0)721 - 608 46285
Fax: +49 (0)721 - 608 45939
e-mail: schauerte@ieee.org
|


|
About my Research: Multimodal Attention
Introduction: Attention is the cognitive process of selectively concentrating on one thing while ignoring others. As such, it is considered as the gateway to the rest of cognition.
Accordingly, it is one of the most intensely studied topics, but nonetheless it is also one of the fastest growing fields within cognitive psychology and cognitive neuroscience.
Since attention has such an important role in cognition, computational models have attracted an increasing interest in fields related to applied perception, most importantly robotics, to realize advanced cognitive technical systems.
Computational Attention in Multimodal, i.e. Language and Gestures, Human-Robot Interaction: Identifying verbally and non-verbally referred-to objects is an important aspect of everyday human-human interaction and, consequently, natural human-robot interaction. Most importantly, it is essential to coordinate the attention with interaction partners and thus achieve joint attention. In my work, I try to create a consistent computational model of attention that integrates the information about the spatial location provided by pointing gestures as well as linguistic descriptions about the visual appearance of the referred-to object in spoken human-robot interaction. This way we are often able to determine known and in many situations even completely unknown referred-to objects.
(see "Focusing Computational Visual Attention in Multi-Modal Human-Robot Interaction", 2010)
|
Audio-Visual Bottom-Up Saliency for Scene Exploration and Analysis: Attention allows (humanoid) robots to effectively focus their limited computational ressources while controlling the sensor orientation to actually improve the perception. Accordingly, it is considered a key requirement for robots in order to operate in realistic, complex environments. We - a cooperation with B. Kühn - try to build an integrated system for audio-visual scene exploration and object analysis. For this purpose, I developed multiple novel methods: surprise-based auditory saliency, quaternion DCT visual saliency, isophote-based saliency map segmentation to determine proto-object hypotheses, and a parametric 3-D model for multimodal saliency fusion that allows ego-motion as well as integrated object-based inhibition of return.
(see "Multimodal Saliency-based Attention for Object-based Scene Analysis", 2011
  and "Predicting Human Gaze using Quaternion DCT Image Signature Saliency and Face Detection", 2012)
|
Audio-Visual and Multi-Camera Attention in Smart Environments: A problem of smart environments is the huge amount of sensor information that needs to be processed in real-time. Accordingly, it seems appropriate to apply computational attention mechanisms. For this purpose, I created a voxel-based 3-D saliency model that uses fuzzy-logic for audio-visual saliency fusion. In order to realize overt attention I developed an active control of the PTZ cameras that maximizes the expected quality of the 3-D model. Furthermore, the expected reconstruction error is used to optimize the placement of the cameras in the room. For covert attention, I applied multi-objective optimization - the objective functions are based on the 3-D saliency information - to provide algorithms as well as human observers the best view(s) of the scene.
(see "Multi-Modal and Multi-Camera Attention in Smart Environments", 2009)
|
|
Code & Data: Looking for code or data sets? Have a look at the "Download" section of the corresponding publication.
Publications: Most publications are listed in the following section. A less detailed publication list is available (here).
|
Publications
B. Schauerte, R. Stiefelhagen, "Predicting Human Gaze using Quaternion DCT Image Signature Saliency and Face Detection". In Proceedings of the 12th IEEE Workshop on the Applications of Computer Vision (WACV) / IEEE Winter Vision Meetings, Breckenridge, CO, USA, January 9-11, 2012. (Best Student Paper Award)
 |
Abstract: We combine and extend the previous work on DCT-based image signatures and face detection to determine the visual saliency. To this end, we transfer the scalar definition of image signatures to quaternion images and thus introduce a novel saliency method using quaternion type-II DCT image signatures. Furthermore, we use MCT-based face detection to model the important influence of faces on the visual saliency using rotated elliptical Gaussian weight functions and evaluate several integration schemes. In order to demonstrate the performance of the proposed methods, we evaluate our approach on the Bruce-Tsotsos (Toronto) and Cerf (FIFA) benchmark eye-tracking data sets. Additionally, we present evaluation results on the Bruce-Tsotsos data set of the most important spectral saliency approaches. We achieve state-of-the-art results in terms of the well-established area under curve (AUC) measure on the Bruce-Tsotsos data set and come close to the ideal AUC on the Cerf data set - with less than one millisecond to calculate the bottom-up QDCT saliency map.
|
Keywords: Spectral Saliency, Quaternion, DCT Image Signatures; MCT Face Detection; Attention; Human Gaze, Eye-Tracking
Download: [ pdf] [ bibtex] [ code #1 - saliency] [ code #2 - Matlab AUC measure implementation] [ data set #1 - Cerf/FIFA face detections]
|
H. Jaspers, B. Schauerte, G. A. Fink, "SIFT-based Camera Localization using Reference Objects for Application in Multi-Camera Environments and Robotics". In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM), Vilamoura, Algarve, Portugal, February 6-8, 2012.
 |
Abstract: We present a unified approach to improve the localization and the perception of a robot in a new environment by using already installed cameras. Using our approach we are able to localize arbitrary cameras in multi-camera environments while automatically extending the camera network in an online, unattended, real-time way. This way, all cameras can be used to improve the perception of the scene, and additional cameras can be added in real-time, e.g., to remove blind spots. [...] we use it to iteratively calibrate the camera network as well as to localize arbitrary cameras, e.g. of mobile phones or robots, inside a multi-camera environment. [...]
|
Keywords: Camera Pose Estimation, Camera Calibration; Scale Ambiguity; SIFT; Multi-Camera Environment, Smart Room, Robot Localization
Download: [ pdf] [ bibtex] [ code #1] [ data set #1]
|
B. Schauerte, B. Kühn, K. Kroschel, R. Stiefelhagen, "Multimodal Saliency-based Attention for Object-based Scene Analysis".
In Proceedings of the 24th International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, San Francisco, CA, USA, September 25-30, 2011.
 |
Abstract: Multimodal attention is a key requirement for humanoid robots in order to navigate in complex environments and act as social, cognitive human partners. To this end, robots have to incorporate attention mechanisms that focus the processing on the potentially most relevant stimuli while controlling the sensor orientation to improve the perception of these stimuli. In this paper, we present our implementation of audio-visual saliency-based attention that we integrated in a system for knowledge-driven audio-visual scene analysis and object-based world modeling. For this purpose, we introduce a novel isophote-based method for proto-object segmentation of saliency maps, a surprise-based auditory saliency definition, and a parametric 3-D model for multimodal saliency fusion. The applicability of the proposed system is demonstrated in a series of experiments.
|
Keywords: Multimodal, Audio-Visual Attention; Auditory Surprise; Isophote-based Visual Proto-Objects; Parametric 3-D Saliency Model and Fusion; Object-based Inhibition of Return; Object-based Scene Exploration and Hierarchical Analysis
Download: [ pdf] [ bibtex] [ code #1 - visual saliency] [ code #2 - windowed Gaussian surprise / auditory saliency]
|
B. Schauerte, G. A. Fink, "Web-based Learning of Naturalized Color Models for Human-Machine Interaction".
In Proceedings of the 12th International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Sydney, Australia, December 1-3, 2010.
 |
Abstract: In recent years, natural verbal and non-verbal human-robot interaction has attracted an increasing interest. Therefore, models for robustly detecting and describing visual attributes of objects such as, e.g., colors are of great importance. However, in order to learn robust models of visual attributes, large data sets are required. Based on the idea to overcome the shortage of annotated training data by acquiring images from the Internet, we propose a method for robustly learning natural color models. Its novel aspects with respect to prior art are: firstly, a randomized HSL transformation that reflects the slight variations and noise of colors observed in real-world imaging sensors; secondly, a probabilistic ranking and selection of the training samples, which removes a considerable amount of outliers from the training data. [...]
|
Keywords: Color Terms; Color Naming; Web-based Learning; Natural vs. Web-based Image Statisticsi; Domain Adaptation; Probabilistic HSL Model; Probabilistic Latent Semantic Analysis (pLSA); Human-Robot Interaction
Download: [ pdf] [ bibtex] [ data set #1 - Google-512] [ report]
|
B. Schauerte, G. A. Fink, "Focusing Computational Visual Attention in Multi-Modal Human-Robot Interaction".
In Proceedings of the 12th International Conference on Multimodal Interfaces (ICMI), ACM, Beijing, China, November 8-12, 2010. (Doctoral Spotlight; Google Travel Grant)
 |
Abstract: Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.
|
Keywords: Modulatable Neuron-based, Phase-based, Spectral Whitening Saliency; Attention; Visual Search; Objects; Color; Shared Attention, Joint Attention; Multi-Modal Interaction, Gestures, Pointing, Language; Deictic Interaction; Spoken Human-Robot Interaction
Download: [ pdf] [ bibtex]
|
B. Schauerte, J. Richarz, G. A. Fink, "Saliency-based Identification and Recognition of Pointed-at Objects".
In Proceedings of the 23rd International Conference on Intelligent Robots and Systems (IROS), IEEE/RSJ, Taipei, Taiwan, October 18-22, 2010.
 |
Abstract: When persons interact, non-verbal cues are used to direct the attention of persons towards objects of interest. Achieving joint attention this way is an important aspect of natural communication. Most importantly, it allows to couple verbal descriptions with the visual appearance of objects, if the referred-to object is non-verbally indicated. In this contribution, we present a system that utilizes bottom-up saliency and pointing gestures to efficiently identify pointed-at objects. Furthermore, the system focuses the visual attention by steering a pan-tilt-zoom camera towards the object of interest and thus provides a suitable model-view for SIFT-based recognition and learning.
|
Keywords: Spectral Residual Saliency, Spectral Whitening Saliency; Joint/Shared Attention, Pointing Gestures; Object Detection and Learning; Maximally Stable Extremal Regions (MSER); Scale-Invariant Feature Transform (SIFT); Active Pan-Tilt-Zoom Camera; Human-Robot Interaction
Download: [ pdf] [ bibtex] [ errata]
|
B. Schauerte, J. Richarz, T. Plötz, C. Thurau, G. A. Fink, "Multi-Modal and Multi-Camera Attention in Smart Environments".
In Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI), pp. 261-268, ACM, Cambridge, MA, USA, November 2-4, 2009. (Outstanding Student Paper Award Finalist)
 |
Abstract: This paper considers the problem of multi-modal saliency and attention. Saliency is a cue that is often used for directing attention of a computer vision system, e.g., in smart environments or for robots. Unlike the majority of recent publications on visual/audio saliency, we aim at a well grounded integration of several modalities. The proposed framework is based on fuzzy aggregations and offers a flexible, plausible, and efficient way for combining multi-modal saliency information. Besides incorporating different modalities, we extend classical 2D saliency maps to multi-camera and multi-modal 3D saliency spaces. For experimental validation we realized the proposed system within a smart environment. The evaluation took place for a demanding setup under real-life conditions, including focus of attention selection for multiple subjects and concurrently active modalities.
|
Keywords: Multi-Camera; 3-D Spatial Saliency; Multi-Modal Saliency; Attention; Active Multi-Camera Control; Volumetric Intersection, Minimal Reconstruction Error; View Selection, Viewpoint Selection; Multi-Modal Sensor Fusion; Fuzzy; Smart Room; Human-Machine Interaction
Download: [ pdf] [ bibtex] [ code #1 - reconstruction error approximation]
|
B. Schauerte, T. Plötz, G. A. Fink, "A Multi-modal Attention System for Smart Environments".
In Proceedings of the 7th International Conference on Computer Vision Systems (ICVS), Lecture Notes in Computer Science, LNCS 5815, pp. 73-83, Springer, Liege, Belgium, October 13-15, 2009.
 |
Abstract: Focusing their attention to the most relevant information is a fundamental biological concept, which allows humans to (re-)act rapidly and safely in complex and unfamiliar environments. This principle has successfully been adopted for technical systems where sensory stimuli need to be processed in an efficient and robust way. In this paper a multi-modal attention system for smart environments is described that explicitly respects efficiency and robustness aspects already by its architecture. The system facilitates unconstrained human-machine interaction by integrating multiple sensory information of different modalities.
|
Keywords: Multi-Modal; Multi-Camera; 3-D Spatial Saliency; Attention; Active Multi-Camera Control; Volumetric Intersection; View Selection, Viewpoint Selection; Real-Time Performance; Distributed, Scalable System; Design; Smart Environment, Smart Room; Human-Machine Interaction
Download: [ pdf] [ bibtex]
|
B. Schauerte, "Multi-modale Aufmerksamkeitssteuerung in einer intelligenten Umgebung" (Multi-modal attention control in an intelligent environment).
Diplom (M.Sc.) Thesis, TU Dortmund University, 2008.
 |
Abstract: Intelligent environments are supposed to simplify the everyday life of their users. To reach this target, a multitude of sensors is necessary to create a model of the current scene. The complete processing of the resulting sensor data stream can exceed the available processing capacities and inhibit the real-time processing of the complete sensor data. A possible solution to this problem is a fast pre-selection of potentially relevant sensor data and the restriction of complex calculations on the pre-selected sensor data. That leaves the question of what is potentially relevant. [...]
|
Download: [ bibtex]
|
B. Schauerte, C. T. Zamfirescu, "Regular graphs in which every pair of points is missed by some longest cycle".
In Annals of University of Craiova, Volume 33, pp. 154-173, 2006.
 |
Abstract: In Petersen's well-known cubic graph every vortex is missed by some longest cycle. Thomassen produced a planar graph with this property. Grünbaum found a cubic graph, in which any two vertices are missed by some longest cycle. In this paper we present a cubic planar graph fulfilling this condition.
|
Download: [ pdf] [ bibtex]
|
B. Schauerte, "Root Treatment - The dangers of rootkits".
In Linux Magazine (UK), Volume 19, pp. 20-23, April, 2002.
|
B. Schauerte, "Feind im Dunkeln - Wie gefährlich sind die Cracker-Werkzeuge".
In Linux Magazin (DE), Volume 3, pp. 44-47, February, 2002.
|
B. Schauerte, "Verborgene Gefahren - Trojanische Pferde in den Kernel laden".
In Linux Magazin (DE), Volume 11, pp. 60-63, October, 2001.
|
 |
Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
|
|
Talks at Institutions
"Computer Vision for Human-Computer Interaction" at the Georg-August-University, Göttingen, Germany, November, 2011.
|
"Attention for Multi-Modal Human-Machine Interaction" at the Culture Lab, Newcastle University, Newcastle Upon Tyne, Great Britain, April, 2010.
|
"Attention for Smart Environments and Human-Machine Interaction" at the Computer Vision for Human-Computer Interaction Lab, Karlsruhe Institute of Technology, Karlsruhe, Germany, February, 2010.
|
"A Multi-modal Attention System for Smart Environments" at the Frankfurt Institute of Advanced Studies, Frankfurt, Germany, January, 2009.
|
Teaching
| Diplom/Master/Bachelor Thesis (also HiWi positions) I offer various topics in the area of Computer Vision, Human-Robot Interaction, Multimodal/Audio-Visual Interaction, Gaze and Gesture Analysis. If you are interested, please feel free to contact me directly. | |
Short Bio
| Since 2010: | Research Assistant at the Computer Vision for Human-Computer Interaction Lab of the Institute for Anthropomatics (Karlsruhe Institute of Technology); member of the international center for advanced communication technologies (interACT) and the collaborative research center SFB 588 "Humanoide Roboter" | |
| 2009: | Ph.D. student at the Robotics Research Institute (TU Dortmund); supported by a fellowship of the TU Dortmund excellence programme | |
| 2003-2008: | Dipl.-Inform. (german M.Sc. degree equivalent) with honors (grade 1.0, i.e. GPA 4.0/4 equivalent) in computer science at TU Dortmund University | |
| 2006-2007: | Member of the "Walking soccer-playing robots" project group (participation in Kid Size League at the RoboCup German Open 2007 and RoboCup 2007) [technical report] | |
| 1984 | born and lucky that 1984 was not like "Nineteen Eighty-Four" | |
|