Deep Learning 3D Scans for Footwear Fit Estimation from a Single Depth Map

Lunscher, Nolan

Deep Learning 3D Scans for Footwear Fit Estimation from a Single Depth Map

Files

Lunscher_Nolan.pdf (12.44 MB)

Date

2018-01-02

Authors

Lunscher, Nolan

Advisor

Zelek, John

Publisher

University of Waterloo

Abstract

In clothing and particularly in footwear, the variance in the size and shape of people and of clothing poses a problem of how to match items of clothing to a person. This is specifically important in footwear, as fit is highly dependent on foot shape, which is not fully captured by shoe size. 3D scanning can be used to determine detailed personalized shape information, which can then be used to match against product shape for a more per- sonalized footwear matching experience. In current implementations however, this process is typically expensive and cumbersome. Typical scanning techniques require that a camera capture an object from many views in order to reconstruct shape. This usually requires either many cameras or a moving camera system, both of which being complex engineering tasks to construct. Ideally, in order to reduce the cost and complexity of scanning systems as much as possible, only a single image from a single camera would be needed. With recent techniques, semantics such as knowing the kind of object in view can be leveraged to determine the full 3D shape given incomplete information. Deep learning methods have been shown to be able to reconstruct 3D shape from limited inputs in highly symmetrical objects such as furniture and vehicles. We apply a deep learning approach to the domain of foot scanning, and present meth- ods to reconstruct a 3D point cloud from a single input depth map. Anthropomorphic body parts can be challenging due to their irregular shapes, difficulty for parameterizing and limited symmetries. We present two methods leveraging deep learning models to pro- duce complete foot scans from a single input depth map. We utilize 3D data from MPII Human Shape based on the CAESAR database, and train deep neural networks to learn anthropomorphic shape representations. Our first method attempts to complete the point cloud supplied by the input depth map by simply synthesizing the remaining information. We show that this method is capable of synthesizing the remainder of a point cloud with accuracies of 2.92±0.72 mm, and can be improved to accuracies of 2.55±0.75 mm when using an updated network architecture. Our second method fully synthesizes a complete point cloud foot scan from multiple virtual view points. We show that this method can produce foot scans with accuracies of 1.55±0.41 mm from a single input depth map. We performed additional experiments on real world foot scans captured using Kinect Fusion. We find that despite being trained only on a low resolution representation of foot shape, our models are able to recognize and synthesize reasonable complete point cloud scans. Our results suggest that our methods can be extended to work in the real world, with additional domain specific data.