Systems Design Engineering

Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9914

This is the collection for the University of Waterloo's Department of Systems Design Engineering.

Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).

Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.

Browse

Now showing 1 - 4 of 4

3D Mesh and Pose Recovery of a Foot from Single Image
(University of Waterloo, 2022-01-18) Boismenu-Quenneville, Frédéric; Zelek, John
The pandemic and the major shift to online shopping has highlighted the current difficulties in getting proper sizing for clothing and shoes. Being able to accurately measure shoes using readily available smartphones would help in minimizing returns and trying to get a better fit. Being able to reconstruct the 3D geometry of a foot irregardless of the foot pose using a smartphone would help for the online shoe shopping experience. Usually, systems reconstructing a 3D foot require the foot to be in a canonical pose or require multiple perspectives. There is no system to our knowledge that allows capturing the precise pose of the foot without expensive equipment. In many situations, the canonical pose or the multiple views are not feasible. Therefore, we propose a system that can infer the 3D reconstruction and the pose estimation of the foot from any pose in only one image. Our kinematic model, based on popular biomechanical models, is made of 18 rotating joints. To obtain the 3D reconstruction, we extract the silhouette of the foot and its joint landmarks from the image space. From the silhouette and the relation between each joint landmark, we can define the shape of the 3D mesh. Most 3D reconstruction algorithms work with up-convolutions which do not preserve the global information of the reconstructed object. Using a template mesh model of the foot and a spatial convolution network designed to learn from sparse data, we are able to recover the local features without losing sight of the global information. To develop the template mesh, we deformed the meshes of a dataset of 3D feet so they can be used to design a PCA model. The template mesh is the PCA model with no variance added to its components. To obtain the 3D pose, we have labelled the vertices of the template mesh according to the joints of our kinematic model. Those labels can be used to estimate the 3D pose from the 3D reconstruction by corresponding the two meshes. To be able to train the system, we needed a good dataset. Since, there was no viable one available, we decided to create our own dataset by using the previously described PCA model of the foot to generate random 3D meshes of feet. We used mesh deformation and inverse kinematics to capture the feet in different poses. Our system showed a good ability to generate detailed feet. However, we could not predict a reliable length and width for each foot since our virtual dataset does not support scaling indications of any kind, other than the ground truths. Our experiments led to an average error of 13.65 mm on the length and 5.72 mm on the width, which is too high to recommend footwear. To ameliorate the performance of our system, the 2D joints detection method could be modified to use the structure of the foot described by our kinematic foot model as a guide to detect more accurately the position of the joints. The loss functions used for 3D reconstruction should also be revisited to generate more reliable reconstructions.
Advancing Photometric Odometry to Dense Volumetric Simultaneous Localization and Mapping
(University of Waterloo, 2025-03-25) Hu, Yan Song; Zelek, John
Navigating complex environments remains a fundamental challenge in robotics. At the core of this challenge is Simultaneous Localization and Mapping (SLAM), the process of creating a map of the environment while simultaneously using that map for navigation. SLAM is essential for mobile robotics because effective navigation is a prerequisite for nearly all real-world robotic applications. Visual SLAM, which relies solely on the input of RGB cameras is important because of the accessibility of cameras, which makes it an ideal solution for widespread robotic deployment. Recent advances in graphics have driven innovation in the visual SLAM domain. Techniques like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) enable the rapid generation of dense volumetric scenes from RGB images. Researchers have integrated these radiance field techniques into SLAM to address a key limitation of traditional systems. Although traditional SLAM excels at localization, the generated maps are often unsuitable for broader robotics applications. By incorporating radiance fields, SLAM systems have the potential for the real-time creation of volumetric metric-semantic maps, offering substantial benefits for robotics. However, current radiance field-based SLAM approaches face challenges, particularly in processing speed and map reconstruction quality. This work introduces a solution that addresses limitations in current radiance fields SLAM systems. Direct SLAM, a traditional SLAM technique, shares key operational similarities with radiance field approaches that suggest potential synergies between the two systems. Both methods rely on photometric loss optimization, where the pixel differences between images guide the optimization process. This work demonstrates that the benefits of combining these complementary techniques extend beyond theory. This work demonstrates the synergy between radiance field techniques and direct SLAM through a novel system that combines 3DGS with direct SLAM, achieving a superior combination of quality, memory efficiency, and speed compared to existing approaches. The system, named MGSO, addresses a challenge in current 3DGS SLAM systems: Initializing 3D Gaussians while performing SLAM simultaneously. The proposed approach leverages direct SLAM to produce dense and structured point clouds for 3DGS initialization. This results in faster optimization, memory compactness, and higher-quality maps even with mobile hardware. These results demonstrate that traditional direct SLAM techniques can be effectively integrated with radiance field representations, opening avenues for future research.
Computational Depth from Defocus via Active Quasi-random Pattern Projections
(University of Waterloo, 2018-08-22) Ma, Bojie; Wong, Alexander; Clausi, David
Depth information is one of the most fundamental cues in interpreting the geometric relationship of objects. It enables machines and robots to perceive the world in 3D and allows them to understand the environment far beyond 2D images. Recovering the depth information of the scene plays a crucial role in computer vision, and hence has a strong connection with many applications in the fields such as robotics, autonomous driving and computer-human interfacing. In this thesis, we proposed, designed, and built a comprehensive system for depth estimation from a single camera capture by leveraging the camera response to the defocus effect of the projected pattern. This approach is fundamentally driven by the concept of active depth from defocus (DfD) which recovers depth by analyzing the defocus effect of the projected pattern at different depth levels as appeared in the captured images. While current active DfD approaches are able to provide high accuracy, they rely on specialized setups to obtain images with different defocus levels, making it impractical for a simple and compact depth-sensing system with a small form factor. The main contribution of this thesis is the use of computational modelling techniques to characterize the camera defocus response of the projection pattern at different depth levels, a new approach in active DfD that enables rapid and accurate depth inference in the absence of complex hardware and extensive computing resources. Specifically, different statistical estimation methods are proposed to approximate the pixel intensity distribution of the projected pattern as measured by the camera sensor, a learning process that essentially summarizes the defocus effect to a handful of optimized, distinctive values. As a result, the blurring appearance of the projected pattern at each depth level is represented by depth features in a computational depth inference model. In the proposed framework, the scene is actively illuminated with a unique quasi-random projection pattern, and a conventional RGB camera is used to acquire an image of the scene. The depth map of the scene can then be recovered by studying the depth feature in the captured image of the blurred projection pattern using the proposed computational depth inference model. To verify the efficacy of the proposed depth estimation approach, quantitative and qualitative experiments are performed on test scenes with different structural characteristics. The results demonstrate that the proposed method can produce accurate depth reconstruction results with high fidelity and has strong potential as a cost effective and computationally efficient mean of generating depth maps.
Enhanced detection of point correspondences in single-shot structured light
(University of Waterloo, 2021-09-21) Sadatsharifi, Kasra; Fieguth, Paul
The crucial role of point correspondences in the process of stereo vision and camera projector calibration is to determine the relationship between the camera view(s) and the projector view(s). Consequently, acquiring accurate and robust point correspondences can result in a very accurate 3D point cloud of a scene. Designing a method that can detect pixel correspondences quickly and accurately and be robust to factors such as object motions and color is an important subject of study. The information that lies in the point correspondences determines the geometry of the scene in which depth plays a very important role, if not the most important. However, point correspondences will include some outliers. Outlier removal is another important aspect of obtaining correspondences that can have substantial impact on the reconstructed point cloud of an object. During the Single-Shot Structured Light (SSSL) calibration process, a pattern consisting of tags with differently shaped symbols inside and separated by grids are projected onto the object. The intersections of these grid lines are considered to be potential pixel correspondences between a camera image and the projector pattern. The purpose of this thesis is to study the robustness and accuracy of pixel correspondences and to enhance their quality. In this thesis we propose a detection method that uses the model of the pattern, specifically the grid lines, which are the largest and brightest feature of the pattern. The input image is partitioned into smaller patches and then the optimization process is executed on each patch. Eventually, the grid lines will be detected and fitted to the grid, and the intersections of those lines are taken as potential corresponding pixels between the views. In order to remove incorrect pixel correspondences, or in other words, outliers, Connected Component Analysis is used to find the closest detected point to the top left corner of each tag. The points remaining after this step are the correct pixel correspondences. Experimental results show the improvement of using a locally adaptive thresholding method against the baseline in detecting tags. The proposed thresholding method showed a maintained accuracy compared to the baseline method while automatically tune all the parameters whereas in the baseline method some parameters need fine tuning. Introduced model-based grid intersection detection yields an approximately 50 times improvement in speed. Inaccuracy in point correspondences are compared with state-of-the-art method based on the generated final reconstructed point clouds using both methods against the CAD model as ground truth. Results show an average of 3 pixels higher error in distance, between the reconstructed point clouds and the CAD model, in the proposed method compared to the baseline.

Browse

Browsing Systems Design Engineering by Subject "3D reconstruction"