Parallel Architectures for Data Mining and Machine Learning Algorithms

Amna Tehreem

DSpace Home
→
E-Theses
→
CEME
→
Computer Engineering
→
MS
→
View Item

dc.contributor.author	Amna Tehreem
dc.date.accessioned	2020-12-31T06:47:03Z
dc.date.available	2020-12-31T06:47:03Z
dc.date.issued	2016
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/20148
dc.description	Supervisor Dr. Shoab Ahmad Khan	en_US
dc.description.abstract	Data mining and machine learning algorithms deal with large amount of data, which with the invention of cost e cient devices has increased by massive amounts. Many algorithms of these domains are not part of real time systems because of their computational complexity and large data on which they need to work. A lot of algorithms are being implemented on parallel processing systems like GPUs and FPGAs etc. to achieve the desired speed. The purpose of this thesis is to provide parallel processing model of mean shift clustering and frequent patter growth (FP-growth) algorithm, targeted to run on FPGA. The general model consists of multiple homogeneous processing entities (PEs) connected through a bus. These PEs work in collaborative working environment with each PE working independently and also communicating with its peers according to the requirements of algorithms. Two architectures for mean shift clustering algorithm are proposed. One of them is a general architecture which divides the computational complexity in each successive iteration by decreasing the number of windows to be processed. The second architecture is proposed and implemented on FPGA for one dimensional data. The algorithm is tested on 20 images from segmentation evaluation database for di erent number of PEs and di erent number of fractional bits used to represent mean. With a clock frequency of approximately 120MHz, the algorithm is able to segment an image in 2.47ms to 0.114ms for 1 PE and 7 fractional bits and 16 PEs with 0 fractional bits respectively as compared to 6.44 minutes per image with the conventional mean shift algorithm. The simplicity of algorithm resulted in very low utilization of Spartan 6 FPGAs resources. A parallel architecture for implementing FP-growth algorithm is also proposed which divides the task e ciently among PEs. The parallel algorithm is tested on databases from UCI machine learning repository and frequent itemset mining dataset repository. Speedup achieved for 2 PEs is approximately 1.99. By increasing PEs to 16, speedup increases to approximately 15.5. The processing requirements for the algorithms show that they can be used in real time systems.	en_US
dc.publisher	CEME, National University of Sciences and Technology, Islamabad	en_US
dc.subject	Computer Engineering	en_US
dc.title	Parallel Architectures for Data Mining and Machine Learning Algorithms	en_US
dc.type	Thesis	en_US