3D Mesh and Pose Recovery of a Foot from Single Image

Boismenu-Quenneville, Frédéric

3D Mesh and Pose Recovery of a Foot from Single Image

Files

Boismenu-Quenneville_Frederic.pdf (5.75 MB)

Date

2022-01-18

Authors

Boismenu-Quenneville, Frédéric

Advisor

Zelek, John

Publisher

University of Waterloo

Abstract

The pandemic and the major shift to online shopping has highlighted the current difficulties in getting proper sizing for clothing and shoes. Being able to accurately measure shoes using readily available smartphones would help in minimizing returns and trying to get a better fit. Being able to reconstruct the 3D geometry of a foot irregardless of the foot pose using a smartphone would help for the online shoe shopping experience. Usually, systems reconstructing a 3D foot require the foot to be in a canonical pose or require multiple perspectives. There is no system to our knowledge that allows capturing the precise pose of the foot without expensive equipment. In many situations, the canonical pose or the multiple views are not feasible. Therefore, we propose a system that can infer the 3D reconstruction and the pose estimation of the foot from any pose in only one image. Our kinematic model, based on popular biomechanical models, is made of 18 rotating joints. To obtain the 3D reconstruction, we extract the silhouette of the foot and its joint landmarks from the image space. From the silhouette and the relation between each joint landmark, we can define the shape of the 3D mesh. Most 3D reconstruction algorithms work with up-convolutions which do not preserve the global information of the reconstructed object. Using a template mesh model of the foot and a spatial convolution network designed to learn from sparse data, we are able to recover the local features without losing sight of the global information. To develop the template mesh, we deformed the meshes of a dataset of 3D feet so they can be used to design a PCA model. The template mesh is the PCA model with no variance added to its components. To obtain the 3D pose, we have labelled the vertices of the template mesh according to the joints of our kinematic model. Those labels can be used to estimate the 3D pose from the 3D reconstruction by corresponding the two meshes. To be able to train the system, we needed a good dataset. Since, there was no viable one available, we decided to create our own dataset by using the previously described PCA model of the foot to generate random 3D meshes of feet. We used mesh deformation and inverse kinematics to capture the feet in different poses. Our system showed a good ability to generate detailed feet. However, we could not predict a reliable length and width for each foot since our virtual dataset does not support scaling indications of any kind, other than the ground truths. Our experiments led to an average error of 13.65 mm on the length and 5.72 mm on the width, which is too high to recommend footwear. To ameliorate the performance of our system, the 2D joints detection method could be modified to use the structure of the foot described by our kinematic foot model as a guide to detect more accurately the position of the joints. The loss functions used for 3D reconstruction should also be revisited to generate more reliable reconstructions.