dc.description.abstract |
Imputing missing data in data sets remains an unresolved challenge. Failing to achieve
precise imputation can result in erroneous and faulty results from machine learning
models, potentially impacting overall outcomes and findings. Real-world datasets often
exhibit a substantial proportion of missing data, with observations containing missing
values accounting for approximately 10 to 40 percent or more of the dataset.
The primary objective of this study is to enhance the accuracy of the existing LGDI
(Large gaps of missing data) multivariate algorithm. This was accomplished by incor porating case-specific considerations, advanced preprocessing techniques, and effective
temporal feature selection methodologies. Case-specific techniques served as the founda tion for this research, while feature selection played a crucial role in optimizing results by
identifying key variables for the LGDI algorithm. Additionally, an important secondary
objective was to extend the capabilities of the LGDI algorithm to handle categorical vari ables effectively. Lastly, this study introduces the Multivariable method, which provides
additional evidence of the research’s efficacy.
The findings of the study exhibit a notable enhancement, as the gradient boosting
algorithm achieves an impressive 61 percent increase in the LGDI R-square coefficient,
despite a missing value rate of 30 percent. Furthermore, other Mice algorithms also
showcase improvements.
To illustrate the findings, an open-source time series dataset called "Traffic-volume" from
Kaggle was utilized in this study. |
en_US |