Investigating the reliability of machine learning algorithms as an advanced tool for ozone concentration prediction

Thumbnail Image
Ayman Mohammed Shaher Yafouz, Mr.
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. Indeed, these levels can have massive effects on the planet, fauna, flora as well as human health either in the long-term or short-term effect. Thus, it is highly required to examine the performance of various artificial intelligence techniques including standalone and hybrid algorithms for accurate ozone concentration prediction at the tropospheric layer. This Ozone (O3) forecast study was carried out by using diversity air pollutants data, namely Sulfur dioxide (SO2), carbon monoxide (CO), nitrogen oxides (NOx), nitrogen dioxide (NO2), and meteorological parameters namely humidity (Hum), wind speed (WS). The hourly basis historical dataset - which has been collected by the Department of Environment, Malaysia - utilized was observed for 3 years in 3 air quality monitoring station sites in Peninsular Malaysia. The methodology of this research has been conducted in 2 paths; standalone and hybrid techniques. The standalone technique is Machine Learning; Multi-Layer Perceptron (MLP), Support Vector Regression (SVR), Extreme Gradient Boost decision tree (XGBoost), and Deep Learning Long-Short-Term-Memory (LSTM) and Convolutional Neural Network (CNN). The hybrid technique has been developed by using deep learning algorithms with the structure of multiple layers (with several neurons) of CNN and LSTM. All tropospheric ozone concentration prediction techniques have been implemented on gases concentrations and metrological data based on 5 times horizon analysis scenarios, that are (1hr, 3hr, 6hr, 12hr, 24hr), with the aid of different numbers of the previously-mentioned input parameters. The standalone and hybrid models have been tested to evaluate their performance via 5 metrics performances; (Coefficient of Determination (R2), Normalized Root Mean Square Error (NRMSE), Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE)), and Modified Taylor Diagram. Also, Sensitivity analysis of input parameters has been investigated. Also, the uncertainty analysis has been performed on all models to evaluate their level of confidence associated with the Willmott Index. In terms of accuracy - R2, the results indicate that (XGBoost) possesses higher accuracy in comparison with (SVR) and (MLP); meanwhile, (CNN) and (LSTM) outweigh the XGBoost. Based on efficiency, accuracy, and robustness, the developed hybrid deep learning technique achieved the highest reliability and accuracy outcomes when compared with every previously-mentioned model. The developed model possesses an exceptional performance with the highest accuracy of R2, narrower width of the confidence interval, and the highest degree of 95% confidence, among all investigated standalone models, which values are 93.48%, 98.16%, and 0.0014195 respectively.
Machine learning algorithms