dc.description.abstract |
Clone detection in software engineering has a fundamental role in ensuring the quality and
maintainability of software systems. Developers often reuse several components of code in their
software and code review to identify clones or refactoring of copied code is often neglected resulting
in code clones. These cloned components can cause several consistency, bug propagation,
maintainability, and quality issues. UML models are the essential artifacts usually in the initial phases
of the process of software development, to specify and visualize the software design. These models
serve as a blueprint to guide throughout all the phases of software development. Therefore, if there are
clones in these UML models they will induce clones in further stages of software development as well.
Therefore, these clones will propagate and amplify the clone-related issues from the basic to the final
stages of software development. For this reason, it is equally essential to identify, track, and remove
the duplicates in UML models as in code. Furthermore, a key goal of Model Driven Software
Engineering (MDSE) is to generate code from models such as UML modes. Consequently, increasing
the importance of Model clone detection.
This study focuses on the application of Natural Language Processing (NLP) to detect clones within
UML models. Initially, a UML model is created and clones are induced in the diagram. The model is
exported in Extensible Markup Language (XML) format to represent the model in textual form. In the
next step, the XML code is parsed to extract the relevant features of the model for clone detection
purposes. Since the XML code of UML diagrams carries a lot of structural information that is
irrelevant for clone detection and is also not balanced. Therefore, the extracted features are further
preprocessed to represent them in a suitable format. Furthermore, the extracted data is labeled to
represent clone and nonclone pairs. Moreover, for the detection of clones Natural Language processing
techniques are used since the naming and representation of properties of elements of UML models are
mostly in textual format. Therefore, NLP techniques can efficiently detect clones in UML Models.
The proposed framework is applied to several case studies. These case studies validate the
effectiveness of our approach in model clone detection. |
en_US |