Generalized Churn Classification Across Multiple Business Domains

Naveed, Maryam

DSpace Home
→
E-Theses
→
CEME
→
Computer Engineering
→
MS
→
View Item

dc.contributor.author	Naveed, Maryam
dc.date.accessioned	2023-08-07T10:30:14Z
dc.date.available	2023-08-07T10:30:14Z
dc.date.issued	2022
dc.identifier.other	277111
dc.identifier.uri	http://10.250.8.41:8080/xmlui/handle/123456789/35746
dc.description	Supervisor: Dr. Arslan Shaukat	en_US
dc.description.abstract	For any organization, customers are the basis for company success, so Customer Relations Management (CRM) is an integral department. CRM research shows that it is more beneficial to retain customers, as it guarantees a higher return than it is to acquire new ones at five times the cost. For this purpose, organizations target minimal churning. Churning is defined as any customer ending a subscription or stop using a service being provided by an entity. Customer churn is happening across various business domains and has quite an impact on revenue generation. For companies to retain their essentials, they must be identified well in time. In the event of their identification, they are subjected to retention strategies. It is also much easier to target a specific group of customers than all of them to ensure retention when possible churning characteristics are identified. This makes churn identification and classification very important for the growth of a business. This research aims to provide a generalized system that includes pre-processing and feature selection that can be utilized with different parts and business rules to identify customers on the verge of churning. A centralized hybrid algorithm has been devised to identify possible at-risk customers. We have addressed the gap created when a researcher has to rely on a hit and trial method to locate the best possible algorithm to solve their problem. Telecommunications data is widely available and has been made the benchmark to test the proposed methodology. We have used available datasets IBM Watson and Cell2Cell and a locally sourced dataset. Classifiers such as Support Vector Machines with RBF kernel, GP-AdaBoost, and Random Forest are used with SMOTE-ENN sampling, RFE feature selection, and normalization techniques. A potent combination of classification evaluation metrics is employed for thorough testing and 10-fold cross-validation for further support. Experiments have been performed with varying parameters and components. We can achieve a ground-breaking accuracy of 0.984 on IBM Watson and 0.994 on Cell2Cell. The locally sourced dataset has not been used in previous research. Hence, it was used as scoring data on which we have achieved an accuracy greater than 0.990. The results achieved on the two benchmark datasets using our proposed system are competitive compared to previous literature reports.	en_US
dc.language.iso	en	en_US
dc.publisher	College of Electrical & Mechanical Engineering (CEME), NUST	en_US
dc.subject	Keywords: Customer Churn, SVM, Random Forest, GP-AdaBoost, SMOTE-ENN	en_US
dc.title	Generalized Churn Classification Across Multiple Business Domains	en_US
dc.type	Thesis	en_US