Abstract:
Viruses, as obligate intracellular parasites, pose significant challenges to global
health, with their diverse forms and mechanisms of infection often presenting diagnostic and
therapeutic complexities while exhibiting high heterogeneity and complexity at both
molecular and cellular levels. They often spread to distant organs which poses challenges for
diagnosis, prognosis and treatment. While, there is much more information that may need
exploration specially at cell level to find main underlying cause to counter diseases or
infections caused by viruses. The major issue at hand is comprehending the underlying
molecular mechanisms of viruses, with a particular focus on identification of cellular
heterogeneity, cell type identification, and gene regulation dynamics across various
viruses. Single cell RNA sequencing is a compelling technology which revolutionized the
field of genomics in recent years by allowing to study patterns of gene expression at
the single-cell level. This study applied scRNA-Seq to healthy and diseased samples of
HCV, SARS-CoV-2 and HIV. The examination yielded valuable insights into cellular
heterogeneity at an individual cell level. In case of HCV, cells such as CD8+T cells are
found to be in greater abundance followed by monocytes. For SARS-CoV-2 the most
prominently occurring cell type is CD8+ T cells followed by NK T cells and CD4+ T cells,
B cells and CD8+ T cells for HIV. While, four main cell types, CD8+ T cells, CD4+ T cells,
Monocytes and NK cells are found to be present in all three diseases. In addition, DGE
analysis and enrichment analysis provides insights into the prevalence of gene expression
within different cell populations along with enriched biological pathways and functional
categories associated with those genes. Furthermore, machine learning classification models
are built to distinguish one disease from another on the basis of identified cell types. The
accuracy of Random Forest model is found to be 93.9%. While for SVM, the accuracy is
94.9% and for RNN the optimal accuracy is found to be 92.8%. SVM outperformed RF and
RNN in terms of each model evaluation metric parameter suggesting that the build model
gives most optimal result in terms of classifying identified cell types in their respective classes
of three viral diseases. These findings significantly enhance the understanding of the intricate
molecular aspects underlying selected viral diseases. Future prospects encompass ongoing
research to refine understanding of viruses and their variants and develop more targeted and
effective treatments, ultimately enhancing the quality of life for individuals grappling with
these infections and long-term diseases.