Abstract:
Early age fertility is a significant public health concern in Pakistan, with profound
implications for women’s health, socio-economic development, and population dynam
ics. This study explores the socio-demographic factors influencing early age fertility
among women in Pakistan using the Pakistan Demographic and Health Survey (PDHS)
dataset from 2017-2018. The research aims to identify the key socio-demographic de
terminants of early age fertility and to evaluate the effectiveness of advanced survival
analysis models in predicting these outcomes.
The study addresses two primary objectives: first, to identify and analyze socio
demographic factors associated with an increased risk of early age fertility, and second,
to compare the predictive performance of three advanced survival models—the Cox
Proportional Hazards Model (CPH), Random Survival Forest (RSF) Model, and Con
ditional Inference Forest (CIF) Model—in the context of early age fertility prediction.
To achieve the first objective, the study employs the CPH model to assess the
impact of various socio-demographic factors on early age fertility. The results indicate
that lower educational attainment, rural residence, and lower socio-economic status are
significantly associated with an increased risk of early age fertility. Specifically, women
with no education have a 4.858 times increased risk of early age fertility compared to
those with higher education, and those living in rural areas face a 1.123 times greater
risk compared to their urban counterparts. Additionally, women from poor socio
economic backgrounds are 1.229 times more likely to experience early age fertility than
those from rich backgrounds.
For the second objective, the study compares the performance of the CPH, RSF,
and CIF models in predicting early age fertility outcomes based on two variables: Age
at Marriage (AAM) and Age at First Birth (RAAFB). The CPH model proves to be
the most effective for predicting early age fertility related to AAM, as it exhibits the
lowest prediction error and Integrated Brier Score (IBS), along with a high C-index
indicating reliable predictions. Conversely, for the RAAFB variable, the CIF model
is identified as the best model due to its lowest prediction error rate and IBS score,
despite RSF’s higher C-index. The CIF model’s superior performance in terms of error
rate and IBS demonstrates its capability for precise prediction of early age fertility
outcomes, making it the preferred choice for this aspect of the analysis.
This study contributes to the understanding of early age fertility in Pakistan by
identifying critical socio-demographic factors and demonstrating the effectiveness of
advanced statistical models for predicting early age fertility. The findings emphasize
the need for targeted interventions addressing educational disparities, rural-urban dif
ferences, and socio-economic inequalities to mitigate the risks associated with early
age fertility. Additionally, the study highlights the importance of using comprehensive
performance metrics for model selection, with CIF emerging as the optimal model for
precise prediction of early age fertility outcomes based on RAAFB.