Abstract:
In the age of technology, terabytes of data are generated each day. Such huge flow of data can
bring many benefits to the business organizations, but it also poses various challenges due to large
volume, velocity, variety, variability, and complexity. These properties of large datasets making it
difficult to perform effective analysis using the traditional techniques. For effective decision
making and accurate results, it is important to analyze data deeply. Rough set theory (RST) is one
of the prominent mathematical-based techniques that analyses huge datasets. It is the best
technique to deal with uncertain, vague, and incomplete data. Moreover, RST does not require any
additional information to analyze data like probability distribution require for Bayesian Decision
Theory. However, in a case where values of attributes maintain preference order over each other,
RST does not consider it. Dominance-based Rough Set Approach (DRSA), a generalization of
RST, studies the dominance aspect of attributes and define the dominance relation. DRSA is an
excellent data analysis tool for multicriteria decision-making, considering preference order of
attributes. To efficiently handle large volume datasets, data mining tools need to be computational
efficient. In DRSA, data analysis mainly depends on the calculations of lower and upper
approximations and computation of these two measures are utilizing many resources i.e., time and
memory, due to the consideration of preference order. In this research, it is proposed a new concept
of heuristic rules, to compute lower and upper approximations without calculating dominance
positive/negative relations. By using the properties and logical structure of approximations, our
mathematical implications select an object for all relevant approximation sets instantaneously and
similarly select remaining objects without even considering the intersection and subset relations.
By avoiding the heavy and redundant computations, the proposed rules can be an efficient
replacement for the conventional method of measuring approximations, especially for large
datasets. An experimental framework was devised to measure the performance of proposed method
in comparison with other techniques in terms of efficiency and accuracy, using benchmark datasets
from UCI. The performance of proposed algorithm and other similar approaches are measured,
against execution time, memory requirements and structural complexity. The algorithms have been
implemented using visual basics platform and conducted experiments on windows 10 operating
system with i-7 generation, x64-based processor and 16GB memory. The results show that the
proposed rules significantly reduce the execution time by avoiding the redundant and lengthy
computations with 100% accuracy, which ultimately reducesthe structural complexity and runtime
memory requirements. By applying proposed rules in parallel threads, a further reduction in
computation time while computing DRSA approximations has been achieved. However, these two
approaches designed for static dataset. To enhance the scope of application, two methods to
accommodate dynamic datasets has been proposed, which can efficiently and accurately handle
single and multi-dimensional variations. In comparison with other similar approaches, proposed
methods successfully update approximation sets using less computational resources.