Abstract:
The rapid development of deep learning techniques has led some to suggest critical 2D computer vision tasks have largely been solved. Yet the same case cannot be made for 3D vision problems. The development of 3D vision sensors and data processing algorithms is integral to robotics applications. An example of this is automated vine pruning in vineyards. Automated vine pruning requires accurate, low-noise, high-resolution depth and colour data to make decisions regarding where to cut vines. Acquiring such data is however a difficult problem. Depth cameras typically employed in such 3D vision tasks suffer from artifacts, leading to noisy, low resolution depth data that contains holes and blurred edges. A variety of solutions to remove these artifacts through denoising, super resolution, edge preservation, and completion have been proposed in the literature. Inspired by prior works, this thesis explores the use of deep point cloud learning to remove artifacts in point clouds. This is accomplished by training such networks to transform geometric and colour data obtained with a low-quality, stereo camera, to more closely represent the high-resolution data acquired with a structured light camera that has a much lower frame-rate. To assess the proposed methods, this thesis presents novel datasets, that consist of pairs of point clouds that were captured with the two cameras in a variety of laboratory and vineyard scenes. A method for automated data collection that involved mounting the cameras on a robotic arm is also presented. Important contributions of this work include ablation studies of pre-processing steps introduced to improve the outputs of the point cloud learning architectures. These pre-processing steps include field-of-view cropping, statistical outlier removal, and segmentation via lightness thresholding. An existing point cloud learning architecture and loss function were also modified to process coloured point clouds. Quantitative and qualitative results demonstrate the ability of the proposed methods to improve the visual quality of inputs obtained from the low-quality 3D camera in most cases. Model inference times also reveal that the proposed methods still enable the low-quality camera to operate at a much higher frame-rate than the high resolution camera.