Visual Attention for Robotic Cognition: A Biologically Inspired Probabilistic Architecture
MetadataShow full item record
The human being, the most magnificent autonomous entity in the universe, frequently takes the decision of `what to look at' in their day-to-day life without even realizing the complexities of the underlying process. When it comes to the design of such an attention system for autonomous robots, all of a sudden this apparently simple task appears to be an extremely complex one with highly dynamic interaction among motor skills, knowledge and experience developed throughout the life-time, highly connected circuitry of the visual cortex, and super-fast timing. The most fascinating thing about visual attention system of the primates is that the underlying mechanism is not precisely known yet. Different influential theories and hypothesis regarding this mechanism, however, are being proposed in psychology and neuroscience. These theories and hypothesis have encouraged the research on synthetic modeling of visual attention in computer vision, computational neuroscience and, very recently, in AI robotics. The major motivation behind the computational modeling of visual attention is two-fold: understanding the mechanism underlying the cognition of the primates' and using the principle of focused attention in different real-world applications, e.g. in computer vision, surveillance, and robotics. Accordingly, we observe the rise of two different trends in the computational modeling of visual attention. The first one is mostly focused on developing mathematical models which mimic, as much as possible, the details of the primates' attention system: the structure, the connectivity among visual neurons and different regions of the visual cortex, the flow of information etc. Such models provide a way to test the theories of the primates' visual attention with minimal involvement from the live subjects. This is a magnificent way to use technological advancement for the understanding of human cognition. The second trend in computational modeling, on the other hand, uses the methodological sophistication of the biological processes (like visual attention) to advance the technology. These models are mostly concerned with developing a technical system of visual attention which can be used in real-world applications where the principle of focused attention might play a significant role for redundant information management. This thesis is focused on developing a computational model of visual attention for robotic cognition and, therefore, belongs to the second trend. The design of a visual attention model for robotic systems as a component of their cognition comes with a number of challenges which, generally, do not appear in the traditional computer vision applications of visual attention. The robotic models of visual attention, although heavily inspired by the rich literature of visual attention in computer vision, adopt different measures to cope with these challenges. This thesis proposes a Bayesian model of visual attention designed specifically for robotic systems and, therefore, tackles the challenges involved with robotic visual attention. The operation of the proposed model is guided by the theory of biased competition, a popular theory from cognitive neuroscience describing the mechanism of primates' visual attention. The proposed Bayesian attention model offers a robot-centric approach of visual attention where the head-pose of a robot in the 3D world is estimated recursively such that the robot can focus on the most behaviorally relevant stimuli in its environment. The behavioral relevance of an object determined based on two criteria which are inspired by the postulates of the biased competitive hypothesis of visual attention in the primates. Accordingly, the proposed model encourages a robot to focus on novel stimuli or stimuli that have similarity with a `sought for' object depending on the context. In order to address a number of robot-specific issues of visual attention, the proposed model is further extended to the multi-modal case where speech commands from the human are used to modulate the visual attention behavior of the robot. The Bayes model of visual attention, inherited from the Bayesian sensor fusion characteristic, naturally accommodates multi-modal information during attention selection. This enables the proposed model to be the core component of an attention oriented speech-based human-robot interaction framework. Extensive experiments are performed in the real-world to investigate different aspects of the proposed Bayesian visual attention model.