Abstract:
Extensive research has been conducted in the field of Association Rules Mining (ARM) for natural
languages. Both the academia and researchers have conceived many applications of ARM for several
domain of Urdu language (i.e. education, publications and web development). Many of these
applications require accuracy. Accuracy is a computation and communication intensive. Severe lack of
resources and limited capability has made providing accuracy a challenging task in ARM. Therefore
techniques need to be devised that provide accuracy without compromising the limited resources
available to ARM.
This thesis focuses on the design, implementation and analysis of an Urdu Mining Model (UMM) based
upon Hybrid Apriori (HA), Enhanced Multipass with Inverted Hashing and Pruning (MIHP), Enhanced
Parallel Multipass with Inverted Hashing and Pruning (EPMIHP) that fulfills the requirements of accuracy
and efficiency without compromising resources. Optimization of algorithms is achieved by using Hash
Tables and avoiding mining of redundant rules.
The proposed UMM, HA, EMIHP and EPMHIP has been implemented using C# and Microsoft SQL Server
2005 in the .NET framework. In order to evaluate the system, several experiments have been carried out
on different number of words using different thresholds. Evaluation of UMM, HA, EMIHP and EPMIHP
have been done by conducting a comprehensive efficiency analysis. Furthermore, the results of
execution time for HA, EMIHP and EPMIHP have been compared with other algorithms. The reduction in
execution time and increase inaccurate association rules prove that HA, EMIHP and EPMIHP is a very
viable choice for association rules of Urdu language. Since the HA, EMIHP and EPMIHP provides
efficiency and accuracy and operated within the available resources therefore they promise to widely
accept.