Abstract:
This research presents two key contributions aimed at improving COVID-19 severity
prediction, specifically intubation or death within one month using 3D CT scan data.
First, we introduce a novel dataset of 2,000 segmented 3D lung cubes meticulously
curated from the STOIC dataset through a robust 10-step preprocessing and segmen
tation pipeline. It is evident that 3D CNNs outperform 2D CNNs in this domain,
owing to their ability to capture inter-slice information in 3D images, while Vision
Transformers excel in texture-based classification tasks. Therefore, as second contri
bution we propose two distinct methods for predicting COVID-19 severity, defined
as intubation or death within one month. The first method employs a 3D-CNN pre
trained on the MosMedData dataset, later fine-tuned on the STOIC dataset with two
input layers: one for 3D lung images and another for age and gender metadata. The
second method know as 3D-EffiBOT leverages a combination of 3D EfficientNetV2
and iBOT architectures to capture both 3D as well as 2D spatial features from vol
umetric CT scans. 3D EfficientNetV2 with weights obtained after inflating 2D Ima
geNet weights, was fine-tuned on the STOIC dataset using a dynamic layer unfreez
ing strategy, while iBOT was employed to extract 2D slice-level features from axial
CT slices. Both models were trained using five augmentation techniques and evalu
ated using stratified 5-fold sampling to address class imbalance, achieving mean AUC
score of 0.7862 and 0.7414 for 3D-EffiBOT and 3D-CNN respectively. This work
demonstrates the effectiveness of hybrid architectures in medical imaging, offering a
significant improvement over conventional method. The results suggest that combin
ing advanced 3D and 2D feature extractors enhances diagnostic accuracy, providing
a valuable tool for predicting severe COVID-19 outcomes. Future research directions
include integrating patient pre-COVID medical history, expanding the model’s appli
cation to other diseases, and exploring ensemble learning for improved performance
across diverse populations.