Mining Crowd Sourcing Repositories for Open Innovation in Software Engineering

Anwar, Zeeshan

DSpace Home
→
E-Theses
→
MCS
→
Computer Software Engineering
→
PhD (CS)
→
View Item

Mining Crowd Sourcing Repositories for Open Innovation in Software Engineering

Anwar, Zeeshan

URI: http://10.250.8.41:8080/xmlui/handle/123456789/44596

Date: 2024-07-09

Abstract:

Software has expanded beyond computers to become an integral component of our daily lives. In spite of being a pivotal discipline, Software engineering, a relatively new discipline when compared to other engineering disciplines, is continually evolving. The same is true for the ever-changing landscape of software development techniques, tools, and application software. Ensuring timely delivery of new features while speeding up the development cycle presents the real challenge. For this purpose, new R&D model is required to speed up the innovation process. Open Innovation using the crowd sourcing is a strong candidate as a new R&D model as it relies on the theory of openness and it can speed up the requirements engineering or features extraction process from the internal and external sources. As a solution, this study intends to use software engineering repositories, such as Stack Overflow, to gain insights from engineers on their experiences. The purpose is to identify the most critical features, indicate any missing functionality, and highlight opportunities for improvement. To overcome these challenges, this research utilizes a Community Question-and-Answering (CQA) site, which serves as valuable knowledge resource. These sites, where millions of users contribute through questions and answers, serve as a repository for user feedback. This research aims to mine features for future software by using this knowledge. The research is divided in stages, employing cutting-edge techniques such as Deep Learning, Big Data analysis, and Transformers, particularly the Bi-Directional Encoder Representation of Transformers (BERT) and Generative Pre-Training Transformer (GPT). These processes include gathering data from CQA sites like Stack Overflow, pre-processing it using text mining techniques, dividing it into various categories with topic modeling, conducting sentiment analysis to determine positive or negative aspects of features, and scoring user input. Output of this research is prioritized list of features extracted from uses feedback. For this purpose a multi model framework is developed that is composed of various modules i.e. Features or Requirement Extraction module, Quality assessment module, Sentiment Analysis module, Severity detection module, Topic Modeling module and feature or requirements priorxxiv itization module. The prioritization module takes inputs from Requirements extraction, Quality Assessment, domain specific Sentiment Analysis, Topic Modeling and Severity detection module and uses a multi-objective criteria for the prioritization of features or requirements. This Prioritization will help organizations to evaluate key features, identify redundant ones, identify confusing or problematic features, and recommend new features to include in the future release of the program under study. Each module of the multi-model framework is independently tested on various benchmark dataset and compared with existing state-of-the-art tools. Finally, the proposed framework is simulated on datasets of six social media applications and the results are benchmark with three existing tools and also evaluated on manually labeled small scale dataset of all the six applications. The empirical results and statistical analysis shows that proposed framework is effective to automatically extract and prioritize the features for new product development or improvements in the existing software. The suggested framework’s analysis shows that its accuracy varies between datasets. Evernote had the greatest accuracy rate at 95%, suggesting effective prioritization. WhatsApp had a high Exact Match rate of 75% but a lower overall accuracy of 85% due to a large number of No Matches. The framework attained an average accuracy of 91%, indicating its usefulness in extracting and prioritizing requirements. A comparative investigation with three existing methodologies reveals that the framework’s outputs are 83% to 85% identical to the RE-Bert tool across several datasets, indicating its competitive performance.