Abstract:
Software Development Life Cycle (SDLC) is a systematic approach that consists of software
requirement engineering, software design, development, implementation, and deployment.
Software design is an important phase that helps realize the requirements into working code.
Recently, Unified Modelling Language (UML) has become an important tool to develop
software design. It provides various modeling structures to depict both static and dynamic
behaviors of system. For static structure, class diagram is an important model that shows
different classes and their relationship. It is significant to develop tools to automatically generate
class diagram from natural language requirements.
Various techniques have been proposed in literature to automatically generate class diagram
from natural language requirements, but these techniques fail to deal with redundant information
present in the form of synonyms. Requirements written in compound and complex sentences are
also problem for these techniques. Furthermore, the generated class diagram may not be
optimized in terms of coupling relationship between classes. These factors make automation of
class diagram generation from natural language requirements a highly challenging task.
Natural Language Processing (NLP) is a well-known approach of computational linguistics used
to extract structured information by processing unstructured text automatically. The technique
has been applied to a number of fields in this regard like sentiment analysis, newspapers
analysis, and bio medical and so on. With advancement in computing, improvement in software
development methodology has also gained vital importance from researchers in order to speedup
software development to fulfill market needs.
In this research study, we comprehensively investigate the application of NLP for the generation
of class diagram. In this research a Systematic Literature Review (SLR) is carried out to select
29 articles published during 2014-2021. After quality Evaluation, only 17 articles consider that
fully fulfills the objective of our research. Subsequently, 14 combinations of main NLP activities
(i.e., Tokenization, POS tagging, Chunking, and Parsing) and 12 NLP algorithms are identified.
Furthermore, 23 existing tools are identified that are further divided into two categories tools
utilized by the researchers are 11 and purposed by researchers are 12. Finally, a comprehensive
analysis is performed to investigate the automation level of NLP applications for the generation
of the class diagrams and test cases from early plain text requirements.
Moreover, this research proposes a model to automatically generate a more accurate and
optimized class diagram from natural language requirements using natural language processing.
A tool named SD-LINGO is developed in this research. The effectiveness of the proposed model
will be analyzed by comparing the generated design with other state of the art approaches. The
validation is performed through six benchmark case studies and six different set of requirements.
The experimental results proved that the proposed NLP approach is fully automated and
considerably improved as compared to the other state-of-the-art app.