Classification of Air Pollution Risk Levels Using a Soft Voting Ensemble Model Based on Real-World Air Quality Monitoring Data
Abstract
Air pollution is one of the major environmental and public health challenges in rapidly urbanizing regions around the world. Elevated concentrations of airborne pollutants such as particulate matter and gaseous emissions can threaten both human health and environmental sustainability. Thus, accurate recognition of pollution risk degrees is vital for developing environmental monitoring systems and supporting decision-making in urban environmental management. We present an ensemble machine learning-based framework to classify the levels of air pollution risk using environmental, meteorological and temporal indicators extracted from real-world air quality monitoring data collected through different urban locations. Our dataset consists of major atmospheric pollutants as well as meteorological variables capturing the air pollution processes in urban areas over varying seasons. Initially evaluated several machine learning algorithms like Random Forest, Extra Trees, Support Vector Machine, Logistic Regression and Extreme Gradient Boosting. A Soft Voting ensemble model was then designed to combine the prediction strengths of all best-performing classifiers. The proposed model attained an accuracy of about 82.7% with the weighted F1-score being 0.828, thus performing better than any single models. Cross-validation validated the framework's robustness and stability, allowing analysis of feature importance to highlight PM2. 5 = the most important determinant of pollution risk levels. The findings highlight the utility of ensemble machine learning methodologies for environmental monitoring, providing greater insight into pollution exposure and informing data-driven decision making to promote sustainable air quality management
References
Li, G., Tang, Y., & Yang, H. (2022). A new hybrid prediction model of air quality index based on secondary decomposition and improved kernel extreme machine learning. Chemosphere, 305, 135348.
Harrison, R. M., & Yin, J. (2000). Particulate matter in the atmosphere: Which particle properties are important for its effects on health? Science of the Total Environment, 249(1–3), 85–101.
Jo, E. J., Lee, W. S., Jo, H. Y., Kim, C. H., Eom, J. S., Mok, J. H., Kim, M. H., Lee, K., Kim, K. U., & Lee, Mects of particulate matter on respiratory disease and the impact of meteorological factors in Busan, Korea. Respiratory Medicine, 124, 79–87.. K. (2017). Eff
Perrino, C., Tiwari, S., Catrambone, M., Dalla Torre, S., Rantica, E., & Canepari, S. (2011). Chemical characterization of atmospheric PM in Delhi, India, during different periods of the year including Diwali festival. Atmospheric Pollution Research, 2(4), 418–427.
Zhang, Y., Bocquet, M., Mallet, V., Seigneur, C., & Baklanov, A. (2012). Real-time air quality forecasting, part I: History, techniques, and current status. Atmospheric Environment, 60, 632–655.
Dye, T. S. (2013). Guidelines for developing an air quality (ozone and PM2.5) forecasting program. United States Environmental Protection Agency, Washington, DC, USA.
Zheng, H., Li, H., Lu, X., &Ruan. T. (2018). A multiple kernel learning approach for air quality prediction. Advances in Meteorology, 2018.
International Journal of Artificial Intelligence in Medical Issues. (2025). https://doi.org/10.56705/ijaimi.v312.322
Kumar, S., Vishwakarma, A., Srivastava, M. K., Perwej, Y., & Akhtar, N. (2026). Ensemble machine learning for reliable air pollution prediction and sustainable environmental management. International Journal of Scientific Research in Science and Technology. Available at: www.ijsrst.com
Moskal, A., Jagodowicz, W., Penconek, A., Zaraska, K. Low-Cost Sensor System for Air Purification Process Evaluation. Sensors 2024, 24, 1769.
Nuwairy El Furqany. (2025). Optimizing air quality index classification using multiple machine learning models and oversampling techniques. International Journal of Artificial Intelligence in Medical Issues, 312. ISSN 3025-4167.
Jaron, A., Berucka, A., Delis, P., & Sekrecka, A. (2024). An assessment of the possibility of using unmanned aerial vehicles to identify and map air pollution from infrastructure emissions. Energies, 17, 577.
Bemacki, J., & Schence, R. (2025). A comprehensive review of data-driven techniques for air pollution concentration forecasting. Sensors, 25, 6044. https://doi.org/10.3390/25196044
Johnson, T.; Woodward, K. Enviro-IoT: Calibrating Low-Cost Environmental Sensors in Urban Settings. arXia 2025, arXiv:2502.07596.
EnviroDataScience. “Air Quality Dataset.” Kaggle, 1 Sept. 2025, www.kaggle.com/datasets/price438/air-quality-dataset.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Sahel Almarifah Journal for Humanities and Applied Sciences

This work is licensed under a Creative Commons Attribution 4.0 International License.