A novel cross-city video-based ensemble deep learning model for air quality estimation
Ahmed M. Zhang X. Shen Y. Hong V.T. Abbas H. Ali S. Ahmed T. Ali A. Gulakhmadov A. Nam W.-H. Chen N.
1 January 2026Elsevier Ltd
Sustainable Cities and Society
2026#136
Air pollution poses a significant threat to sustainable urban development, public health, and environmental quality in cities worldwide. Accurate PM2.5 estimation is essential for effective urban planning and pollution control. However, existing methods often rely on single static images, which fail to capture the dynamic temporal nature of pollution and typically require city-specific training, thereby limiting their transferability to new cities. To address these issues, this study proposes the Cross-City Air Quality Estimation Network (CCAQE-Net), a novel video-based hybrid deep learning model designed to estimate PM2.5 concentrations in unseen cities. The model employs the Video Vision Transformer (ViViT) with self-attention mechanism to extract spatiotemporal pollution features from videos, and then applies a Random Forest Regressor (RFR) to map complex non-linear relationships and estimate PM2.5. We evaluated the model using the PCCAQV dataset, which contains 8640 hourly outdoor air quality videos from six major Pakistani cities: Karachi, Multan, Faisalabad, Lahore, Rawalpindi, and Islamabad. The inclusion of data from multiple cities enables the model to learn comprehensive spatiotemporal air pollution dynamics under varying meteorological conditions, atmospheric interactions, pollution sources, and topographical features, thus enhancing its robustness and generalizability. To rigorously assess the models ability, we adopted a Leave-One-City-Out (LOCO) approach: training on five cities and testing on the sixth, iteratively repeating this process for all six cities. The experimental results demonstrate that the proposed model achieves superior performance, with an average R2 of 0.97, RMSE of 1.42 μg/m3, MAE of 1.14 μg/m3, and MAPE of 10.87 %, outperforming current state-of-the-art models across all evaluated cities.
Cross-city estimation , Deep learning , PM2.5 , RFR , ViViT
Text of the article Перейти на текст статьи
National Engineering Research Center of Geographic Information System, School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430074, China
School of Economics and Management, China University of Geosciences, Wuhan, 430074, China
School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430074, China
Software College, Northeastern University, Shenyang, 110169, China
Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan
Department of Cybernetics, Nanotechnology and Data Processing, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, 44-100, Poland
Research Center of Ecology and Environment in Central Asia, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China
Institute of Water Problems, Hydropower and Ecology of the National Academy of Sciences of Tajikistan, Dushanbe, 734042, Tajikistan
School of Social Safety and Systems Engineering, Institute of Agricultural Environmental Science, National Agricultural Water Research Center, Hankyong National University, Anseong, South Korea
National Engineering Research Center of Geographic Information System
School of Economics and Management
School of Geography and Information Engineering
Software College
Department of Computer Science
Department of Cybernetics
Research Center of Ecology and Environment in Central Asia
Institute of Water Problems
School of Social Safety and Systems Engineering
10 лет помогаем публиковать статьи Международный издатель
Книга Публикация научной статьи Волощук 2026 Book Publication of a scientific article 2026