Effort Estimation in Crowdsourced Software Development for Effective Project Management

Yasmin, Anum

DSpace Home
→
E-Theses
→
CEME
→
Computer Software Engineering
→
PhD
→
View Item

Effort Estimation in Crowdsourced Software Development for Effective Project Management

Yasmin, Anum

URI: http://10.250.8.41:8080/xmlui/handle/123456789/48036

Date: 2024

Abstract:

Crowdsourced software development (CSSD) serves as an evolutionary problemsolving platform that operates within a distributed environment, combining humanintensive work with machine computation. CSSD is getting great deal of attention in software practitioner and researchers due to its promising functionality and flexible working mechanism. Despite being a favorable environment, CSSD platform offers challenges including ineffective decision making on task selection, unjustified price allocation, and task completion uncertainty. In practice, no intelligent mechanism exits for task selection and award money assignment process for any CSSD platform, rather, rule-of-thumb or intuition strategies are employed, leading to biasness and subjectivity. Effort considerations on crowdsourced tasks can offer good foundation to combat these challenges but are not much investigated. Conversely, Software development effort estimation (SDEE) is a well-established field in traditional software engineering practices, aiding efficient resource and cost management. While SDEE is a prevalent domain in software engineering, its exploration is limited to in-house development and while for open-source or crowdsourced platforms requires further considerations. SDEE domain is largely facilitated by Ensemble effort estimation (EEE), comparatively providing stable results across diverse contexts. The EEE performance is significantly influenced by hyperparameters composition of solo ML learners and individual predication’s weights. Hence optimization of hyperparameter and weight assignment is crucial in ensemble learning. EEE literature lacks in investigating the impact of optimization in both perspectives (i.e., hyperparameter tuning and weight assignment of single ML techniques) and explored by few researchers. Additionally, selection of an appropriate search space for optimization is an essential but often overlooked aspect of hyperparameter optimization. Considering the need of effort estimation in the field of CSSD, this study aims to conjoin the accuracy of optimized EEE with a typical CSSD platform, to estimate crowdsourced task’s effort for justified task selection and pricing mechanism. In this dissertation, CSSD platform is investigated from the perspective of three major software crowdsourcing phases: Design, Development and Testing. Datasets are defined with highly relevant, phase-centric features encompassing crowdsourced designing, development and testing perspective. TopCoder is selected as target CSSD platform, given its rising popularity among both software engineers and the research community. An improved ensemble effort estimation framework is established for CSSD platform by proposed Metaheuristic-optimized Multi-dimensional bagging scheme and Weighted Ensemble (MoMdbWE) approach. The proposed scheme combines the effectiveness of optimization in two perspectives: hyperparameter optimization and optimized weight learning in ensemble. Firefly algorithm (FFA) is selected as optimization algorithm in this study for its promising results of fitness in terms of MAE. The study employs three base algorithms—Random Forest (RF), Support Vector Regression (SVR), and Neural Network (NeuralNet)—due to their recurrent application in SDEE literature and superior performance. The MoMdbWE approach i is achieved through a novel method named Multi-dimensional Bagging (Mdb), which involves the division of the search space for the three base algorithms. FFA is employed to determine the optimal hyperparameters for each division, followed by the FFA weight optimization to construct a Metaheuristic-optimized weighted ensemble (MoWE) comprising individual multi-dimensional bagging schemes. Proposed MoMdbWE framework is implemented on aforementioned TopCoder datasets and evaluated using error metrics (MAE, RMSE, MMRE, MdMRE, Pred), standard accuracy (SA), effect size, and the Wilcox statistical test. The results are compared against solo base algorithms, state-of-the-art homogeneous and heterogeneous ensembles. Notably, the MoMdbWE scheme outperforms solo learners by 80%, 97%, and 90% in the Design, Development, and Testing datasets, respectively. Similarly, it surpasses homogeneous ensembles by 75%, 97%, and 91% across the datasets, while in heterogeneous ensembles 70%, 20%, and 47% improvements are observed. The assessment of performance reveals the effectiveness of proposed approach as it validates the significance of crowd sourced datasets applied in the study along side the selected features and their suitability for training ML-based EEE model for TopCoder platform. This approach holds potential for practical application in the crowd sourced context for task selection and pricing.