Abstract:
Depression and anxiety are prevalent among 10-20% of children and adolescents globally,
with an estimated 15 million people affected in Pakistan. Despite this growing figure, the
general Pakistani population lacks awareness regarding mental disorders due to limited
mental healthcare resources and negative perception of mental health. This study aims to
utilize machine learning with RCADS to maximize the use of current healthcare resources
and facilitate depression and anxiety screening. Three feature selection methods i.e., the
Chi-square test of independence, Spearman correlation, and Recursive Feature Elimination
revealed a weak correlation with the evaluation of depression and anxiety in the study
population. Data augmentation was done using the multinomial probability distribution of
the existing data to generate hybrid-synthetic correlated discrete multinomial variates of
each item of RCADS-47, to address the limitation of a small sample size. Six commonly
employed ML algorithms—Decision Tree, Random Forest, Support Vector Machine,
Logistic Regression, Naive Bayes, and K-Nearest Neighbor—were trained on the hybrid
data to develop the predictive models. The Naive Bayes algorithm yielded the best overall
results with up to 75% accuracy and a 0.75 F1 score. The findings suggest that the Naive
Bayes algorithm using 46 features suits the data well and has the potential to be used as a
data-driven decision support system for the concerned professionals and improve the usual
way of screening anxiety and depression in children and adolescents.