Abstract:
Background: The unusual growth of the glandular tissue on the boundary of the Thyroid gland is an indication of Thyroid disease. Thyroid disease is characterised by an unusually high or low number of hormones produced by the thyroid gland, the two most prevalent kinds are hypothyroidism (underactive thyroid gland) and hyperthyroidism (overactive thyroid gland). The main aim of this project was to introduce the concept of an efficient multi-stage ensemble i.e., the voting ensemble of the homogeneous ensemble which could be used with a variety of feature-selection algorithms for improving the diagnosis of thyroid diseases. The dataset utilised in this study was built from real-time thyroid data obtained from the teaching hospital in DG Khan at District Head Quarter (DHQ), Pakistan. Following the appropriate pre-processing processes, three kinds of attribute-selection strategies were used: The first approach used was Select from Model (SFM), the second technique was the Select K-Best (SKB), and the final methodology was the Recursive Feature Elimination (RFE). Select From Model (SFM) is a form of attribute-selection strategy that uses a model to select attributes. As potential feature estimators, the Decision Tree (DT), Logistic Regression (LR), Gradient Boosting (GB) and Random Forest denoted as the (RF) classifiers were employed in conjunction with each other. The homogeneous ensemble activated the bagging, boosting-based learners, who were then classified by the Voting ensemble, which employed both soft and hard voting to categorise the data. Other performance assessment criteria such as hamming loss, accuracy, mean square error, sensitivity and others have been implemented. The results of the experiments reveal that when the suggested approach for better thyroid sickness detection is applied in its most practicable form, it is most successful. On the dataset 1, all of the algorithms tested obtained 100 % accuracy with subset of the total no of feature in each case, however on the dataset 2, more than 98 percent accuracy was reached in every case. On the basis of accuracy and computing cost, the results given here exceeded equivalent benchmark models in their respective fields of study.