Automatic Labeling of Security Bug Reports

Sadiq, Mohammad Umer

DSpace Home
→
E-Theses
→
CEME
→
Computer Software Engineering
→
MS
→
View Item

Automatic Labeling of Security Bug Reports

Sadiq, Mohammad Umer

URI: http://10.250.8.41:8080/xmlui/handle/123456789/35541

Date: 2020

Abstract:

There are many stages in software development life cycle and each stage is associated with different kind of artifacts. Bug reports used for many software development activities like severity and priority assignment and triaging of bugs. It’s difficult for the developers to resolve all bug reports due to the limited resources. Developers usually need to prioritize bug-reports to resolved bugs of various software projects hurriedly. There are various-types-of bug reports such as-security, -performance, regression, -usability and crash. Among these, security bug reports are highly crucial. These types of bug reports can express security debt that could abused by the hackers-if they disclosed before they resolved. A security-bug can-becomes the reason of-an-unauthorized-access-to the software applications. These bugs are great threat tothe-privacy and-security of users. Therefore, these bugs are needed to be resolved as early as possible. A bug reports contains many different fields, showing information about bugs. Certain fields are optional, and some are mandatory. JIRA consists a column named “type”, which may be a-bug, an advanced-feature, an-improvement or a-support-request. In BUGZILLA, key-word field is tagged with category of bug such as ‘perf’ for performance bug. Label or Type field describe the type of a bug report. Label can give an understanding about the bug reports and also be used for the priority of bug reports. Previous Studies show that many bug reports are not labeled, if some are label, they may be not accurate. In this research, we purposed an-approach for automatic labeling of security bug reports. At first, we conducted the systematic literature review (SLR), this SLR consists distribution of papers according to approaches used by authors. Identify different NLP techniques, libraries and technologies used to develop tools. Then we identified thirteen (13) tools that are purposed or developed by different researchers and a comparison is performed. After performing SLR, we purposed anovel-approach for the automatic labelling of security-bug-reports by using natural language processing’s (NLP) techniques and machine learning (ML) algorithms. Our approach named ALSBR is-implemented-in-Python using Natural language Toolkit (NLTK), Sklearn and Imblearn libraries. In our purposed methodology, first of all preprocessing of bug reports is performed. After the preprocessing, features are selected by TF-IDF values. Top hundred terms according to TF-IDF values are selected as features. After feature selections, a random-undersampling-technique is applied to balance the majority and minority classes. Three machine learning algorithms named Logistic-Regression, Decision-Tree and Naïve-Bayes is utilized as classification model. A voting strategy is also applied to get the more accuracy. For the validation of our approach, 10-Fold cross-validation is applied. We used bug reports of five projects for the evaluation of our approach. Among these projects, four are from Ohira and one is a-subset-of-bug reports that is selected from-Chromium-project. At the end, we compared purposed-approach-with state-of-the-art approach-named FARSEC model and achieved improved results in terms of Precision, Probability of detection (Recall), Probability of false alarm, F-measure and G-measure.