dc.contributor.author |
Sabih-Ul-Hassan, Muhammad |
|
dc.date.accessioned |
2024-11-11T09:29:45Z |
|
dc.date.available |
2024-11-11T09:29:45Z |
|
dc.date.issued |
2024 |
|
dc.identifier.other |
400868 |
|
dc.identifier.uri |
http://10.250.8.41:8080/xmlui/handle/123456789/47844 |
|
dc.description |
Supervisor: Dr. Shahbaz Khan |
en_US |
dc.description.abstract |
The increasing adoption of 3D data-driven applications such as robotics, autonomous
vehicles, and virtual reality has made 3D point clouds a vital form of data representation.
However, the intrinsic anomalies and disorganized characteristics of point clouds pose
significant challenges for traditional supervised learning algorithms, which typically
require extensively annotated datasets for tasks like classification, segmentation, and part
segmentation. To address these challenges, we introduce a multi-modal self-supervised
learning framework that integrates 3D point clouds with 2D rendered images. Our approach
enhances performance in classification, segmentation, and part segmentation tasks by
implementing intra-modal self-supervision within the point clouds and cross-modal selfsupervision between the point clouds and their corresponding rendered images. This
combination leverages the strengths of both data modalities, enabling the model to learn
more robust and comprehensive representations for improved 3D point cloud
understanding. Leveraging complementary information from both modalities, it learns
robust, discriminative features without costly annotations. We employ contrastive learning
methods to extract meaningful representations from both modalities in a self-supervised
manner. Intra-modal self-supervision encourages the model to learn structural and
geometric features within point cloud data, while cross-modal supervision aligns features
between 3D point clouds and their 2D projections, capturing rich semantic information.
Evaluation of the performance indicates enhanced performance of the framework in
comparison with single and multi modal approaches on classification, segmentation, and
part segmentation active on pre-existing multi view 3d datasets with three dimensionalxiv
sample images. The results demonstrate how being able to adopt a multi-modal selfsupervised approach enables a transition away from techniques that depend too much on
labeled data without undermining performance on 3D applications. These results show the
worth of the self-supervised learning in the 3D task and suggest multi-modal data can deal
with the problem of point clouds better than observed. Given these results, further
developments are encouraged on enabling the adoption of self-supervised strategies that
combine multiple data types to improve 3D understanding tasks without annotations. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
School of Mechanical & Manufacturing Engineering (SMME), NUST |
en_US |
dc.relation.ispartofseries |
SMME-TH-1095; |
|
dc.subject |
3D point clouds, Self-Supervised learning, Multi-Modality, Deep learning, Computer Vision, Contrastive learning, Classification, Segmentation |
en_US |
dc.title |
Advancing Point Cloud Understanding through SelfSupervised Intra-Modal and Cross-Modal Contrastive Learning |
en_US |
dc.type |
Thesis |
en_US |