Abstract:
Accurate categorization of software requirements into security (SR) and non-security
(NSR) categories is crucial for project management and decision-making. The traditional
categorization is time-taken and susceptible to mistakes nature, necessitating
an automated solution. This study investigates the automatic labeling of requirement
sentences by utilizing TF-IDF in conjunction with Individual Keyword Comparison
(IKC) and Combined Keyword Comparison (CKC) methods, with validation
performed using machine learning models, including Support Vector Machine (SVM),
Logistic Regression (LR), and Random Forest (RF). This research identifies effective
techniques and models for SR classification, enhancing efficiency and accuracy in software
engineering processes. This research advances data prepossessing and SR classification
methodologies, providing insights for improved decision-making in software
development projects.Additionally, my method ASBL (Automatic Score-Based Labeling)
achieves a training accuracy of 92% when validated through machine learning
SVM after automatically labeling requirements of the combined dataset of DOSSPRE
and PROMISE into security and non-security categories. Furthermore, an accuracy
of up to 81% was demonstrated when the model was tested by classifying the project
requirements of MCS final year students.