Sharma, Gauri2022-09-292022-09-292022-09-292022-08-16http://hdl.handle.net/10012/18851An automated assembly system is an integral part of various manufacturing industries as it reduces production cycle-time resulting in lower costs and a higher rate of production. The modular system design integrates main assembly workstations and parts-feeding machines to build a fully assembled product or sub-assembly of a larger product. Machine operation failure within the subsystems and errors in parts loading lead to slower production and gradual accumulation of parts. Repeated human intervention is required to manually clear jams at varying locations of the subsystems. To ensure increased operator safety and reduction in cycle-time, visual surveillance plays a critical role in providing real-time alerts of spatiotemporal parts irregularities. In this study, surveillance videos are obtained using external observers to conduct spatiotemporal object segmentation within: digital assembly, linear conveyance system, and vibratory bowl parts-feeder machine. As the datasets have different anomaly specifications and visual characteristics, we follow a bottom-up architecture for motion-based and appearance-based segmentation using computer vision techniques and deep-learning models. To perform motion-based segmentation, we evaluate deep learning-based and classical techniques to compute optical flow for real-time moving-object detection. As local and global methods assume brightness constancy and flow smoothness, results showed fewer detections in presence of illumination variance and occlusion. Therefore, we utilize RAFT for optical flow and apply its iteratively updated flow field to create a pixel-based object tracker. The tracker differentiates previous and current moving parts in different colored segments and simultaneously visualizes the flow field to illustrate movement direction and magnitude. We compare the segmentation performance of the optical flow-based tracker with a space-time graph neural network (ST-GNN), and it shows increased accuracy in boundary mask IoU alignment than the pixel-based tracker. As the ST-GNN addresses the limited dataset challenge in our application by learning visual correspondence as a contrastive random walk in palindrome sequences, we proceed with ST-GNN to perform motion-based segmentation. As ST-GNN requires a first-frame annotation mask for initialization, we explore appearance-based segmentation methods to enable automatic ST-GNN initialization. We evaluate pixel-based, interactive-based, and supervised segmentation techniques on the bowl-feeder image dataset. Results illustrate that K-means applied with watershed segmentation and gaussian blur reduces superpixel oversegmentation and generates segmentation aligned with parts boundary. Using Watershed Segmentation on the bowl-feeder image dataset, 377 parts were detected and segmented of total 476 parts present within the machine. We find that GLCM and Gabor filter perform better in segmenting dense parts regions than graph-based and entropy-based segmentation. In comparison to entropy-based and graph-based methods, the GLCM and Gabor filter segment 467 and 476 parts, respectively, of total 476 parts present within the bowl-feeder. Although manual annotation decreases efficiency, we see that the GrabCut annotation tool generates segmentation masks with increased accuracy than the pre-trained interactive tool. Using the GrabCut annotation tool, all 216 parts present within the bowl-feeder machine are segmented. To ensure segmentation of all parts within the bowl-feeder, we train Detectron2 with data augmentation. We see that supervised segmentation outperforms pixel-based and interactive-based segmentation. To address illumination variance within datasets, we apply color-based segmentation by conversion of image datasets to HSV color space. We utilize the images, converted within the value channel of HSV representation, for background subtraction techniques to detect moving bowl-feeder parts in real-time. To resolve image registration errors due to lower image resolution, we create Flex-Sim synthetic dataset with various anomaly instances consisting of multiple camera viewpoints. We apply preprocessing methods and affine-based transformation with RANSAC for robust image registration. We compare color and texture-based handcrafted features of registered images to ensure complete image alignment. We evaluate the PatchCore Anomaly detection method, pre-trained on MVTec industrial dataset, to the Flex-Sim dataset. We find that generated segmentation maps detect various anomaly instances within the Flex-Sim dataset.encomputer visionmoving object detectiontemporal segmentationappearance-based segmentationspatiotemporal segmentation in manufacturingautomated visual inspection in manufacturingmachine learningmoving object pixel trackingmoving object tracking2D affine image registrationvisual inspection in manufacturing videosanomaly detection in manufacturinghandcrafted features in manufacturingtexture and color handcrafted methodsocclusioninteractive-based segmentation in manufacturingpatchcore anomaly detectionillumination invariance in manufacturing videostexture analysispixel-based object segmentationdetectron2optical flowlucas kanaderaft deep learning-based optical flowRANSACregion-merging watershed segmentationspace-time graph neural networkoptical flow field computationbackground subtraction in manufacturingshape-based segmentationSIFT featuresgrabcut segmentationgrabcut in manufacturingmeanshift clustering segmentationspatiotemporal irregularities in automated assembly machinesanomaly detection in automated assembly machinespatiotemporal irregularities alerts in manufacturingmotion detectionspatial characteristicsAutomating Manufacturing Surveillance Processes Using External ObserversMaster Thesis