Abstract:
There are many stages in software development life cycle and each stage is associated
with different kind of artifacts. Bug reports used for many software development activities like
severity and priority assignment and triaging of bugs. It’s difficult for the developers to resolve
all bug reports due to the limited resources. Developers usually need to prioritize bug-reports
to resolved bugs of various software projects hurriedly. There are various-types-of bug reports
such as-security, -performance, regression, -usability and crash. Among these, security bug
reports are highly crucial. These types of bug reports can express security debt that could
abused by the hackers-if they disclosed before they resolved. A security-bug can-becomes the
reason of-an-unauthorized-access-to the software applications. These bugs are great threat tothe-privacy and-security of users. Therefore, these bugs are needed to be resolved as early as
possible. A bug reports contains many different fields, showing information about bugs.
Certain fields are optional, and some are mandatory. JIRA consists a column named “type”,
which may be a-bug, an advanced-feature, an-improvement or a-support-request. In
BUGZILLA, key-word field is tagged with category of bug such as ‘perf’ for performance bug.
Label or Type field describe the type of a bug report. Label can give an understanding about
the bug reports and also be used for the priority of bug reports. Previous Studies show that
many bug reports are not labeled, if some are label, they may be not accurate. In this research,
we purposed an-approach for automatic labeling of security bug reports. At first, we conducted
the systematic literature review (SLR), this SLR consists distribution of papers according to
approaches used by authors. Identify different NLP techniques, libraries and technologies used
to develop tools. Then we identified thirteen (13) tools that are purposed or developed by
different researchers and a comparison is performed. After performing SLR, we purposed anovel-approach for the automatic labelling of security-bug-reports by using natural language
processing’s (NLP) techniques and machine learning (ML) algorithms. Our approach named
ALSBR is-implemented-in-Python using Natural language Toolkit (NLTK), Sklearn and
Imblearn libraries. In our purposed methodology, first of all preprocessing of bug reports is
performed. After the preprocessing, features are selected by TF-IDF values. Top hundred terms
according to TF-IDF values are selected as features. After feature selections, a random-undersampling-technique is applied to balance the majority and minority classes. Three machine
learning algorithms named Logistic-Regression, Decision-Tree and Naïve-Bayes is utilized as
classification model. A voting strategy is also applied to get the more accuracy. For the
validation of our approach, 10-Fold cross-validation is applied. We used bug reports of five
projects for the evaluation of our approach. Among these projects, four are from Ohira and one
is a-subset-of-bug reports that is selected from-Chromium-project. At the end, we compared
purposed-approach-with state-of-the-art approach-named FARSEC model and achieved
improved results in terms of Precision, Probability of detection (Recall), Probability of false
alarm, F-measure and G-measure.