Intelligent environments are supposed to simplify the everyday life of their users. To reach this target, a multitude of sensors is necessary to create a model of the current scene. The complete processing of the resulting sensor data stream can exceed the available processing capacities and inhibit the real-time processing of the complete sensor data. A possible solution to this problem is a fast pre-selection of potentially relevant sensor data and the restriction of complex calculations on the pre-selected sensor data. That leaves the question of what is potentially relevant. The topic of this thesis is the construction of a multi-modal attention control in an intelligent environment. Attention is the process of selectively concentrating on one aspect of the environment while ignoring others. Especially the human attention is intensily studied in psychology and cognitive neuroscience. For this reason the principles of the human attention are surveyed and transfered into the new application area. The basis is a 3-dimensional, multi-modal saliency model, which is constructed out of potentially relevant sensor data. This model is the base to transfer the two human mechanisms of attention into the intelligent environment: the act of mentally focusing on one of several possible sensory stimuli, the so-called covert attention, is realized as a selection of the sensor which is assumed to have the best perception of the scene; the act of directing sense organs towards a stimulus source, the so-called overt attention, is realized as a multi-camera, active-vision system which optimizes the visual scene perception and the ì-dimensional saliency model. The attention control was implemented and evaluated in the practiceoriented intelligent environment FINCA at the Robotics Research Institute.