Recovering Scale in Relative Pose and Target Model Estimation Using Monocular Vision
MetadataShow full item record
A combined relative pose and target object model estimation framework using a monocular camera as the primary feedback sensor has been designed and validated in a simulated robotic environment. The monocular camera is mounted on the end-effector of a robot manipulator and measures the image plane coordinates of a set of point features on a target workpiece object. Using this information, the relative position and orientation, as well as the geometry, of the target object are recovered recursively by a Kalman filter process. The Kalman filter facilitates the fusion of supplemental measurements from range sensors, with those gathered with the camera. This process allows the estimated system state to be accurate and recover the proper environment scale. Current approaches in the research areas of visual servoing control and mobile robotics are studied in the case where the target object feature point geometry is well-known prior to the beginning of the estimation. In this case, only the relative pose of target object frames is estimated over a sequence of frames from a single monocular camera. An observability analysis was carried out to identify the physical configurations of camera and target object for which the relative pose cannot be recovered by measuring only the camera image plane coordinates of the object point features. A popular extension to this is to concurrently estimate the target object model concurrently with the relative pose of the camera frame, a process known as Simultaneous Localization and Mapping (SLAM). The recursive framework was augmented to facilitate this larger estimation problem. The scale of the recovered solution is ambiguous using measurements from a single camera. A second observability analysis highlights more configurations for which the relative pose and target object model are unrecoverable from camera measurements alone. Instead, measurements which contain the global scale are required to obtain an accurate solution. A set of additional sensors are detailed, including range finders and additional cameras. Measurement models for each are given, which facilitate the fusion of this supplemental data with the original monocular camera image measurements. A complete framework is then derived to combine a set of such sensor measurements to recover an accurate relative pose and target object model estimate. This proposed framework is tested in a simulation environment with a virtual robot manipulator tracking a target object workpiece through a relative trajectory. All of the detailed estimation schemes are executed: the single monocular camera cases when the target object geometry are known and unknown, respectively; a two camera system in which the measurements are fused within the Kalman filter to recover the scale of the environment; a camera and point range sensor combination which provides a single range measurement at each system time step; and a laser pointer and camera hybrid which concurrently tries to measure the feature point images and a single range metric. The performance of the individual test cases are compared to determine which set of sensors is able to provide robust and reliable estimates for use in real world robotic applications. Finally, some conclusions on the performance of the estimators are drawn and directions for future work are suggested. The camera and range finder combination is shown to accurately recover the proper scale for the estimate and warrants further investigation. Further, early results from the multiple monocular camera setup show superior performance to the other sensor combinations and interesting possibilities are available for wide field-of-view super sensors with high frame rates, built from many inexpensive devices.