3D Mesh and Pose Recovery of a Foot from Single Image
Loading...
Date
2022-01-18
Authors
Boismenu-Quenneville, Frédéric
Advisor
Zelek, John
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
The pandemic and the major shift to online shopping has highlighted the current difficulties in getting proper sizing for clothing and shoes. Being able to accurately measure
shoes using readily available smartphones would help in minimizing returns and trying to
get a better fit. Being able to reconstruct the 3D geometry of a foot irregardless of the
foot pose using a smartphone would help for the online shoe shopping experience. Usually,
systems reconstructing a 3D foot require the foot to be in a canonical pose or require multiple perspectives. There is no system to our knowledge that allows capturing the precise
pose of the foot without expensive equipment. In many situations, the canonical pose or
the multiple views are not feasible. Therefore, we propose a system that can infer the 3D
reconstruction and the pose estimation of the foot from any pose in only one image. Our
kinematic model, based on popular biomechanical models, is made of 18 rotating joints. To
obtain the 3D reconstruction, we extract the silhouette of the foot and its joint landmarks
from the image space. From the silhouette and the relation between each joint landmark,
we can define the shape of the 3D mesh. Most 3D reconstruction algorithms work with
up-convolutions which do not preserve the global information of the reconstructed object.
Using a template mesh model of the foot and a spatial convolution network designed to
learn from sparse data, we are able to recover the local features without losing sight of the
global information. To develop the template mesh, we deformed the meshes of a dataset of
3D feet so they can be used to design a PCA model. The template mesh is the PCA model
with no variance added to its components. To obtain the 3D pose, we have labelled the
vertices of the template mesh according to the joints of our kinematic model. Those labels
can be used to estimate the 3D pose from the 3D reconstruction by corresponding the two
meshes. To be able to train the system, we needed a good dataset. Since, there was no viable one available, we decided to create our own dataset by using the previously described
PCA model of the foot to generate random 3D meshes of feet. We used mesh deformation
and inverse kinematics to capture the feet in different poses. Our system showed a good
ability to generate detailed feet. However, we could not predict a reliable length and width
for each foot since our virtual dataset does not support scaling indications of any kind,
other than the ground truths. Our experiments led to an average error of 13.65 mm on the
length and 5.72 mm on the width, which is too high to recommend footwear. To ameliorate
the performance of our system, the 2D joints detection method could be modified to use
the structure of the foot described by our kinematic foot model as a guide to detect more
accurately the position of the joints. The loss functions used for 3D reconstruction should
also be revisited to generate more reliable reconstructions.
Description
Keywords
3D reconstruction, 3D pose estimation, synthetic dataset, image segmentation, foot reconstruction, foot pose estimation, foot kinematic model, deep learning, monocular reconstruction, mesh deformation