Abstract:
This research represents a notable stride in the evolution of machine learning appli cations within the healthcare domain, seeking to address a conspicuous void in the
intricate realm of asthma severity prediction and classification. While extant literature
has predominantly concentrated on forecasting asthma attacks, a substantial lacuna
persists in achieving a nuanced comprehension and prognostication of individualized
asthma severity. The research introduces pioneering methodologies that seamlessly fuse
machine learning techniques with the nuanced capabilities of natural language process ing (NLP), leveraging textual statements of symptoms provided by asthma sufferers
over an extensive temporal span. The augmentation of the predictive model with the
integration of respiratory audio data further enriches the depth and scope of the classi fication paradigm, presenting a comprehensive and holistic approach to asthma severity
assessment.
The meticulously designed methodology traverses through an intricate process of data
preprocessing, a cornerstone element encompassing multifaceted procedures such as to kenization, lowercasing, lemmatization, and the judicious removal of punctuation and
stop words. The subsequent utilization of Term Frequency-Inverse Document Frequency
(TF-IDF) and Latent Semantic Analysis (LSA) unfolds as a pivotal facet, orchestrating
the extraction of features that contribute substantively to the efficacy of the predic tive model. The integration of respiratory audio data imparts an additional layer of
complexity, prompting a granular examination of abnormal sounds, including but not
limited to crackles and wheeze.
Results emanating from the rigorous evaluation process provide a nuanced perspective
on the model’s proficiency. A discernible strength surfaces in its capacity to accurately
discern instances devoid of abnormalities (’none’ class), while concurrently illuminating challenges in distinguishing between specific classes, thereby underscoring the intricacies
inherent in predicting asthma severity based on audio data. The dichotomy between su pervised and unsupervised classifiers unfolds with Random Forest emerging triumphant
over Stochastic Gradient Descent in predicting severity. Concurrently, K-means cluster ing manifests as a compelling contender, showcasing comparable accuracy and thereby
delineating its latent efficiency in the realm of asthma severity prediction.
In conclusion, this research constitutes a substantial and seminal contribution, not
merely to bridge extant gaps but to redefine and advance the landscape of asthma
management through personalized healthcare. The identified challenges, meticulously
dissected within this research, serve as beacons guiding the trajectory of future stud ies, poised to refine models, fortify predictive accuracy, and ultimately redefine the
paradigms of personalized asthma management.