Development of a Semantic Model and Synthetic Dataset for Multi-Grasp Affordance Detection for Application to Vision-Based Upper-Limb Prosthetic Grasping

Ng, Nathan

Development of a Semantic Model and Synthetic Dataset for Multi-Grasp Affordance Detection for Application to Vision-Based Upper-Limb Prosthetic Grasping

dc.contributor.advisor	Kofman, Jonathan
dc.contributor.advisor	Jeon, Soo
dc.contributor.author	Ng, Nathan
dc.date.accessioned	2024-05-27T20:08:03Z
dc.date.available	2024-05-27T20:08:03Z
dc.date.issued	2024-05-27
dc.date.submitted	2024-05-24
dc.description.abstract	Current upper-limb prosthetic grasping methods are predominately myoelectric, where surface electromyogram (sEMG) pattern recognition is used to predict a grasp type for a prosthetic hand to grasp objects. The sEMG patterns also simultaneously detect the action intent of a grasping action and overall movements of the prosthetic arm. Since the overall control strategy of a myoelectric prosthesis is coupled, the prediction of grasp types can be inaccurate, especially if the grasp type has a similar sEMG pattern for manipulating the prosthetic arm or selecting other grasp types. Recent vision-based prosthetic grasping methods solve the coupled control strategy of myoelectric prostheses, by implementing a camera system to capture an RGB image of an object and a convolutional neural network (CNN) to predict a grasp type. The action intent to move the prosthetic arm and perform the grasping action is independently determined through sEMG pattern recognition. Unlike myoelectric prostheses, vision-based prostheses can predict a suitable grasp type based on the features of an object (e.g. object’s shape). However, current vision-based grasping methods are limited because each object can only be grasped with a single grasp type, despite the object’s shape, environmental context, and the available tasks. Recent robotic grasping applications implement grasp affordance detection to identify the regions on an object that can be grasped for a task. By adapting the detection of grasp affordances into a vision-based prosthetic device, multiple task-oriented grasp-type predictions are possible for each object. Therefore, to improve the vision system in vision-based prostheses, grasp affordance detection methods from robotic grasping applications are adapted in this thesis research. Grasp affordances, as grasp-type and task regions, are predicted by implementing instance segmentation models. Instance segmentation models utilize RGB images to localize objects and their grasp affordances with bounding box locations and image mask segmentation. Since there is no instance segmentation model and dataset that can allow the simultaneous detection of objects and their grasp affordances, the Multi-Affordance Detection Network (MAD-Net) model and Multi-Object Multi-Grasp-Affordance (MOMA) synthetic dataset were developed as part of this thesis research. Unlike the current vision-based prosthetic grasping methods, MAD-Net can detect objects and their grasp affordances in multi-object RGB scenes. The MAD-Net model was derived from the Mask R-CNN model, a common baseline model for instance segmentation. Most instance segmentation models were derived from Mask R-CNN, since the additional mask prediction head in Mask R-CNN can convert all object detection models into instance segmentation models. The MOMA synthetic dataset is a collection of 20K RGB images that is generated from placing random images of objects on random background images. Each image generated was automatically annotated with the instances of objects and their grasp affordances (grasp-type and task regions). The single-object RGB images used for synthetic dataset generation were manually captured with a camera and then manually annotated. The mean average precision (mAP) metric is used to evaluate the performance of MAD-Net and other instance segmentation models on the MOMA dataset. The mAP metric is a good indicator of model performance, since it determines how accurate the predicted bounding box and image mask locations are w.r.t. the ground truth annotations. MAD-Net has outperformed all the other instance segmentation models across all detection categories (objects, grasp types, tasks) on the validation datasets. On the test datasets, MAD-Net has maintained a similar mAP score as the other instance segmentation models. In all cases, MAD-Net has outperformed Mask R-CNN, especially in the grasp type detection category, where MAD-Net has a 10% increase in the mAP score compared to Mask R-CNN. When the objects and their grasp affordances are jointly trained on the MOMA dataset, the total training time decreased by 50%. Since MAD-Net has outperformed Mask R-CNN, the joint detection of objects and their grasp affordances is a feasible solution to implement in the vision system for vision-based prostheses. Although the proposed vision system produces multiple task-oriented grasp types on a single object, modern myoelectric prostheses can select a grasp type from a small selection of pre-programmed grasp types. A grasp database can also be implemented alongside the proposed vision system. Prosthetic users can continuously update the database for new unseen objects and their corresponding task-oriented grasp types.	en
dc.identifier.uri	http://hdl.handle.net/10012/20619
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	computer vision	en
dc.subject	prosthetic grasping	en
dc.subject	convolutional neural network	en
dc.subject	grasp affordance detection	en
dc.subject	synthetic dataset	en
dc.subject	image processing	en
dc.subject	grasp-type detection	en
dc.subject	instance segmentation	en
dc.title	Development of a Semantic Model and Synthetic Dataset for Multi-Grasp Affordance Detection for Application to Vision-Based Upper-Limb Prosthetic Grasping	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Mechanical and Mechatronics Engineering	en
uws-etd.degree.discipline	Mechanical Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Kofman, Jonathan
uws.contributor.advisor	Jeon, Soo
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ng_Nathan.pdf
Size:: 8.96 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Mechanical and Mechatronics Engineering