High Perfomance grid enabled data mining

Aftab, Shehroz; Akhtar, Saeed; Waqar, Momina; Farooq, Omar Mukhtar; Supervised by Naveed Sarfraz Khattak

DSpace Home
→
E-Theses
→
MCS
→
Computer Software Engineering
→
BESE
→
View Item

High Perfomance grid enabled data mining

Aftab, Shehroz; Akhtar, Saeed; Waqar, Momina; Farooq, Omar Mukhtar; Supervised by Naveed Sarfraz Khattak

URI: http://10.250.8.41:8080/xmlui/handle/123456789/10439

Date: 2008-03

Abstract:

In many application areas data mining algorithms invariably operate on centralized data, in practice related information is often acquired and stored at geographically distributed locations due to organizational or operational constraints. However centralization of such data before analysis may neither be desirable nor feasible for most practical applications due to efficiency and limitations on resources, such as network bandwidth. Moreover, data preprocessing and data mining algorithms are known to be both compute and data intensive. The Grid computing community promises to offer infrastructures that allow on-demand access to distributed resources. [1]. The proposed and implemented solution uses Grid infrastructure to perform mining on the given data sets. In this technique data is mined locally at the sites and suitable representatives are extracted. These representative models are then sent to a global server site where based on these local representatives Global models are formed. This approach increases efficiency by decreasing computational and bandwidth costs required for transmission. The experimental results further verify this hypothesis by clearly displaying the efficiency difference between centralized data mining and when done in a distributed fashion using the proposed approach and the same data sets.

Show full item record