Abstract:
Software has expanded beyond computers to become an integral component of our daily lives.
In spite of being a pivotal discipline, Software engineering, a relatively new discipline when
compared to other engineering disciplines, is continually evolving. The same is true for the
ever-changing landscape of software development techniques, tools, and application software.
Ensuring timely delivery of new features while speeding up the development cycle presents the
real challenge. For this purpose, new R&D model is required to speed up the innovation process.
Open Innovation using the crowd sourcing is a strong candidate as a new R&D model as it relies
on the theory of openness and it can speed up the requirements engineering or features extraction
process from the internal and external sources. As a solution, this study intends to use software
engineering repositories, such as Stack Overflow, to gain insights from engineers on their
experiences. The purpose is to identify the most critical features, indicate any missing functionality,
and highlight opportunities for improvement. To overcome these challenges, this research
utilizes a Community Question-and-Answering (CQA) site, which serves as valuable knowledge
resource. These sites, where millions of users contribute through questions and answers,
serve as a repository for user feedback. This research aims to mine features for future software
by using this knowledge. The research is divided in stages, employing cutting-edge techniques
such as Deep Learning, Big Data analysis, and Transformers, particularly the Bi-Directional Encoder
Representation of Transformers (BERT) and Generative Pre-Training Transformer (GPT).
These processes include gathering data from CQA sites like Stack Overflow, pre-processing it
using text mining techniques, dividing it into various categories with topic modeling, conducting
sentiment analysis to determine positive or negative aspects of features, and scoring user
input. Output of this research is prioritized list of features extracted from uses feedback. For
this purpose a multi model framework is developed that is composed of various modules i.e.
Features or Requirement Extraction module, Quality assessment module, Sentiment Analysis
module, Severity detection module, Topic Modeling module and feature or requirements priorxxiv
itization module. The prioritization module takes inputs from Requirements extraction, Quality
Assessment, domain specific Sentiment Analysis, Topic Modeling and Severity detection module
and uses a multi-objective criteria for the prioritization of features or requirements. This
Prioritization will help organizations to evaluate key features, identify redundant ones, identify
confusing or problematic features, and recommend new features to include in the future release
of the program under study. Each module of the multi-model framework is independently tested
on various benchmark dataset and compared with existing state-of-the-art tools. Finally, the
proposed framework is simulated on datasets of six social media applications and the results are
benchmark with three existing tools and also evaluated on manually labeled small scale dataset
of all the six applications. The empirical results and statistical analysis shows that proposed
framework is effective to automatically extract and prioritize the features for new product development
or improvements in the existing software. The suggested framework’s analysis shows
that its accuracy varies between datasets. Evernote had the greatest accuracy rate at 95%, suggesting
effective prioritization. WhatsApp had a high Exact Match rate of 75% but a lower overall
accuracy of 85% due to a large number of No Matches. The framework attained an average
accuracy of 91%, indicating its usefulness in extracting and prioritizing requirements. A comparative
investigation with three existing methodologies reveals that the framework’s outputs
are 83% to 85% identical to the RE-Bert tool across several datasets, indicating its competitive
performance.