Abstract:
Biological wastewater treatment is an established technique to treat industrial and
municipal wastewater, which degrades pollutants through the actions of microorganisms. The primary challenge with current biological wastewater treatment is the need for
external aeration or supply of O₂, which is required for the oxidation of organic matter
and nitrification processes. Oxygenic photogranulation (OPG) is an aeration-free
biological wastewater treatment in which dense photogranules are formed and
characterized by high settling velocities. However, the scale-up of OPG-based
wastewater treatment systems poses significant issues due to dynamic and complex
system variables, which have non-linear interactions, making troubleshooting an
expensive endeavour. To solve these issues, machine learning models are effective in
simulating the wastewater treatment process, as mechanistic models are computationally
expensive and interactions between input and output features are not well understood
because of non-linearity. This study investigates the two-stage feature selection method
to enhance the prediction performance of SVI30, an operational parameter that ensures
the settleability of biomass and minimizes the loss of photogranules. The two-stage
feature selection method identifies the relevant subset of input features to predict SVI30, thus enhancing the accuracy and performance of machine learning models. The optimal
feature subsets generated by two-stage features are evaluated by four regression models:
decision tree, random forest, gradient boosting, and XGBoost. The performance
efficiency of all regression models is evaluated by an evaluation matrix. The regression
models with optimal subsets of features identified by two-stage feature selection
demonstrate a prediction efficiency of 85%. This research provides a comprehens
machine learning-based approach that can improve the predictability and control of
operational parameters for an efficient OPG wastewater process. Advanced feature
selection methods can significantly enhance the performance of machine learning
models in OPG-based systems, leading to more sustainable wastewater management
solutions.