Development of a Semantic Model and Synthetic Dataset for Multi-Grasp Affordance Detection for Application to Vision-Based Upper-Limb Prosthetic Grasping

dc.contributor.authorNg, Nathan
dc.date.accessioned2024-05-27T20:08:03Z
dc.date.available2024-05-27T20:08:03Z
dc.date.issued2024-05-27
dc.date.submitted2024-05-24
dc.description.abstractCurrent upper-limb prosthetic grasping methods are predominately myoelectric, where surface electromyogram (sEMG) pattern recognition is used to predict a grasp type for a prosthetic hand to grasp objects. The sEMG patterns also simultaneously detect the action intent of a grasping action and overall movements of the prosthetic arm. Since the overall control strategy of a myoelectric prosthesis is coupled, the prediction of grasp types can be inaccurate, especially if the grasp type has a similar sEMG pattern for manipulating the prosthetic arm or selecting other grasp types. Recent vision-based prosthetic grasping methods solve the coupled control strategy of myoelectric prostheses, by implementing a camera system to capture an RGB image of an object and a convolutional neural network (CNN) to predict a grasp type. The action intent to move the prosthetic arm and perform the grasping action is independently determined through sEMG pattern recognition. Unlike myoelectric prostheses, vision-based prostheses can predict a suitable grasp type based on the features of an object (e.g. object’s shape). However, current vision-based grasping methods are limited because each object can only be grasped with a single grasp type, despite the object’s shape, environmental context, and the available tasks. Recent robotic grasping applications implement grasp affordance detection to identify the regions on an object that can be grasped for a task. By adapting the detection of grasp affordances into a vision-based prosthetic device, multiple task-oriented grasp-type predictions are possible for each object. Therefore, to improve the vision system in vision-based prostheses, grasp affordance detection methods from robotic grasping applications are adapted in this thesis research. Grasp affordances, as grasp-type and task regions, are predicted by implementing instance segmentation models. Instance segmentation models utilize RGB images to localize objects and their grasp affordances with bounding box locations and image mask segmentation. Since there is no instance segmentation model and dataset that can allow the simultaneous detection of objects and their grasp affordances, the Multi-Affordance Detection Network (MAD-Net) model and Multi-Object Multi-Grasp-Affordance (MOMA) synthetic dataset were developed as part of this thesis research. Unlike the current vision-based prosthetic grasping methods, MAD-Net can detect objects and their grasp affordances in multi-object RGB scenes. The MAD-Net model was derived from the Mask R-CNN model, a common baseline model for instance segmentation. Most instance segmentation models were derived from Mask R-CNN, since the additional mask prediction head in Mask R-CNN can convert all object detection models into instance segmentation models. The MOMA synthetic dataset is a collection of 20K RGB images that is generated from placing random images of objects on random background images. Each image generated was automatically annotated with the instances of objects and their grasp affordances (grasp-type and task regions). The single-object RGB images used for synthetic dataset generation were manually captured with a camera and then manually annotated. The mean average precision (mAP) metric is used to evaluate the performance of MAD-Net and other instance segmentation models on the MOMA dataset. The mAP metric is a good indicator of model performance, since it determines how accurate the predicted bounding box and image mask locations are w.r.t. the ground truth annotations. MAD-Net has outperformed all the other instance segmentation models across all detection categories (objects, grasp types, tasks) on the validation datasets. On the test datasets, MAD-Net has maintained a similar mAP score as the other instance segmentation models. In all cases, MAD-Net has outperformed Mask R-CNN, especially in the grasp type detection category, where MAD-Net has a 10% increase in the mAP score compared to Mask R-CNN. When the objects and their grasp affordances are jointly trained on the MOMA dataset, the total training time decreased by 50%. Since MAD-Net has outperformed Mask R-CNN, the joint detection of objects and their grasp affordances is a feasible solution to implement in the vision system for vision-based prostheses. Although the proposed vision system produces multiple task-oriented grasp types on a single object, modern myoelectric prostheses can select a grasp type from a small selection of pre-programmed grasp types. A grasp database can also be implemented alongside the proposed vision system. Prosthetic users can continuously update the database for new unseen objects and their corresponding task-oriented grasp types.en
dc.identifier.urihttp://hdl.handle.net/10012/20619
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectcomputer visionen
dc.subjectprosthetic graspingen
dc.subjectconvolutional neural networken
dc.subjectgrasp affordance detectionen
dc.subjectsynthetic dataseten
dc.subjectimage processingen
dc.subjectgrasp-type detectionen
dc.subjectinstance segmentationen
dc.titleDevelopment of a Semantic Model and Synthetic Dataset for Multi-Grasp Affordance Detection for Application to Vision-Based Upper-Limb Prosthetic Graspingen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.degree.departmentMechanical and Mechatronics Engineeringen
uws-etd.degree.disciplineMechanical Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorKofman, Jonathan
uws.contributor.advisorJeon, Soo
uws.contributor.affiliation1Faculty of Engineeringen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ng_Nathan.pdf
Size:
8.96 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: