NUST Institutional Repository

Advancing Point Cloud Understanding through SelfSupervised Intra-Modal and Cross-Modal Contrastive Learning

Show simple item record

dc.contributor.author Sabih-Ul-Hassan, Muhammad
dc.date.accessioned 2024-11-11T09:29:45Z
dc.date.available 2024-11-11T09:29:45Z
dc.date.issued 2024
dc.identifier.other 400868
dc.identifier.uri http://10.250.8.41:8080/xmlui/handle/123456789/47844
dc.description Supervisor: Dr. Shahbaz Khan en_US
dc.description.abstract The increasing adoption of 3D data-driven applications such as robotics, autonomous vehicles, and virtual reality has made 3D point clouds a vital form of data representation. However, the intrinsic anomalies and disorganized characteristics of point clouds pose significant challenges for traditional supervised learning algorithms, which typically require extensively annotated datasets for tasks like classification, segmentation, and part segmentation. To address these challenges, we introduce a multi-modal self-supervised learning framework that integrates 3D point clouds with 2D rendered images. Our approach enhances performance in classification, segmentation, and part segmentation tasks by implementing intra-modal self-supervision within the point clouds and cross-modal selfsupervision between the point clouds and their corresponding rendered images. This combination leverages the strengths of both data modalities, enabling the model to learn more robust and comprehensive representations for improved 3D point cloud understanding. Leveraging complementary information from both modalities, it learns robust, discriminative features without costly annotations. We employ contrastive learning methods to extract meaningful representations from both modalities in a self-supervised manner. Intra-modal self-supervision encourages the model to learn structural and geometric features within point cloud data, while cross-modal supervision aligns features between 3D point clouds and their 2D projections, capturing rich semantic information. Evaluation of the performance indicates enhanced performance of the framework in comparison with single and multi modal approaches on classification, segmentation, and part segmentation active on pre-existing multi view 3d datasets with three dimensionalxiv sample images. The results demonstrate how being able to adopt a multi-modal selfsupervised approach enables a transition away from techniques that depend too much on labeled data without undermining performance on 3D applications. These results show the worth of the self-supervised learning in the 3D task and suggest multi-modal data can deal with the problem of point clouds better than observed. Given these results, further developments are encouraged on enabling the adoption of self-supervised strategies that combine multiple data types to improve 3D understanding tasks without annotations. en_US
dc.language.iso en en_US
dc.publisher School of Mechanical & Manufacturing Engineering (SMME), NUST en_US
dc.relation.ispartofseries SMME-TH-1095;
dc.subject 3D point clouds, Self-Supervised learning, Multi-Modality, Deep learning, Computer Vision, Contrastive learning, Classification, Segmentation en_US
dc.title Advancing Point Cloud Understanding through SelfSupervised Intra-Modal and Cross-Modal Contrastive Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

  • MS [205]

Show simple item record

Search DSpace


Advanced Search

Browse

My Account