Asian Journal of atmospheric environment
Asian Journal of atmospheric environment Asian Journal of atmospheric environment
Asian Journal of atmospheric environment
  Aims and Scope Type of Manuscripts Best Practices Contact Information  
  Editor-in-Chief Associate Editors Editorial Advisory Board  

Journal Archive

Asian Journal of Atmospheric Environment - Vol. 15 , No. 1

[ Research Article ]
Asian Journal of Atmospheric Environment - Vol. 15, No. 1
Abbreviation: Asian J. Atmos. Environ
ISSN: 1976-6912 (Print) 2287-1160 (Online)
Print publication date 31 Mar 2021
Received 13 Dec 2020 Revised 02 Mar 2021 Accepted 08 Mar 2021
DOI: https://doi.org/10.5572/ajae.2020.131

Air Pollution in Indian Cities and Comparison of MLR, ANN and CART Models for Predicting PM10 Concentrations in Guwahati, India
Abhishek Dutta* ; Wanida Jinsart
Department of Environmental Science, Faculty of Science, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok 10330, Thailand

Correspondence to : * Tel: +66880441556 E-mail: duttabob@gmail.com


Copyright © 2021 by Asian Association for Atmospheric Environment
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Funding Information ▼

Abstract

Indian cities are increasingly becoming susceptible to PM10 induced health hazards, thereby creating concern for the country’s policymakers. Air pollution is engulfing the comparatively smaller cities as the rapid pace of urbanization, and economic development seem not to lose steam. A review of air pollution of 28 cities of India, which includes tier-I, II, and III cities of India, found to have grossly violated both WHO (World Health Organisation) and NAAQS (National Ambient Air Quality Standard of India) in respect of acceptable daily average PM10 (particulate matter less than 10 μm in aerodynamic diameter) concentrations by a wide margin. Predicting the city level PM10 concentrations in advance and accordingly initiate prior actions is an acceptable solution to save the city dwellers from PM10 induced health hazards. Predictive ability of three models, linear Multiple Linear Regression (MLR), nonlinear Multi-Layer Perceptron class of Artificial Neural Network (MLP ANN), and nonlinear Classification and Regression Tree (CART), for one day ahead PM10 concentration forecasting of tier-II Guwahati city, were tested with 2016-2018 daily average observed climate data, PM10, and gaseous pollutants. The results show that the non-linear algorithm MLP with feedforward backpropagation network topologies of ANN class, gives the best prediction value compared with linear MLR and nonlinear CART model. Therefore, ANN (MLP) approach may be useful to effectively derive a predictive understanding of one day ahead PM10 concentration level and thus provide a tool to the policymakers for initiating in situ measures to curb air pollution and improve public health.


Keywords: Air pollution, Prediction, Artificial neural network, Multi-variate linear regression, Small city

1. INTRODUCTION

Over the past years, airborne particulate matter (PM) concentrations in Indian cities have been rising and became a matter of concern for the policymakers in India. The effort towards air quality improvement is not easy for a country like India as the country policymakers cannot forego the objective of faster economic development to sustain its vast population. Different sources are continuously pouring pollutants in the city air, and notable amongst them are burning of fuels, industrial establishments, different constructions related to infrastructure, power plants both government and privately operated, stubble burning of agricultural biomass residue in the neighborhoods, and vehicular movements (Guttikunda, 2017). However, their proportional contribution varies across the cities of India. If the whole of India is to be considered about 53,929 automobiles hit India’s road every day (Dutta and Dutta, 2018). All led to the dismal status of air pollution situation across the cities of India. Table 1 and Table S1 present a summary of studies conducted by different researchers in the context of 28 Indian cities with reported level of PM10 concentrations and times they violated the air quality standard of both World Health Organization (WHO) and National Ambient Air Quality Standards (NAAQS) of India, respectively. Kolkata, a tier-1 city* in India, even clocked PM10 concentration of as high as 445±21 μg m-3 during the wintertime (Das et al., 2015). Annual PM10 concentrations in New Delhi were reported to be 222±14 μg m-3 while an earlier study reported summer and winter mean concentrations as 95.1±22.2 μg m-3 and 182±32.5 μg m-3, respectively (Tiwari et al., 2014; Singh et al., 2011). Bengaluru city also registered a high annual mean PM10 concentration level of 349.8±205.8 μg m-3 during the year 2015 (Guttikunda et al., 2019). In comparison, lower concentrations have been reported for Hyderabad and Mumbai where mean PM10 concentrations for the period 2005-2012 were 174.4±86.6 μg m-3 and 54.4±25.2 μg m-3, respectively (Dholakia et al., 2014).

Table 1. 
Summary of ambient PM10 concentrations from several cities across India (mass concentrations in μg m-3).
City City type Sampling year Type PM10 μg m-3 References
Kolkata I Dec, 2013-Jan, 2014 Winter mean 445±210 Das et al., 2015
Pune I Jan-Dec, 2016 Mean 62.5 Gawhane, 2019
June, 2011-May, 2012 Mean 113.8 Yadav and Satsangi, 2013
Hyderabad I 2005-2012 Mean 174.4±86.6 Dholakia et al., 2014
June, 2004-May, 2005 Mean 135.1±37.92 Gummeneni et al., 2011
Mumbai I 2005-2012 Mean 54.4±25.2 Dholakia et al., 2014
Ahmedabad I 2005-2012 Mean 108.3±69.8 Dholakia et al., 2014
Delhi I April to June, 2008 Summer mean 95.1±22.2 Singh et al., 2011
Nov, 2007-Jan, 2008 Winter mean 182±32.5
Sept, 2010-Aug, 2012 Mean 222 ±142 Tiwari et al., 2014
Bengaluru I 2011 Annual mean 221.4±187.5 Guttikunda et al., 2019
2012 Annual mean 275.6±180.8
2013 Annual mean 314.3±213.4
2014 Annual mean 333.6±216.3
2015 Annual mean 349.8±205.8
2011-2015 Mean (5 year) 298.94±200.76
2005-2012 Mean 80.4±21.9 Dholakia et al., 2014
Jodhpur II Aug-Sept, 2011 Mean monsoon 180 Sudheer et al., 2016
Varanasi II Mar, 2013 to Feb, 2014 Annual mean 176.1±85 Murari et al., 2015
Agra II 2000-2016 Range 175 to 295 De, 2019
April, 2010 to Jan, 2011 Mean 230.5 Pipal, 2014
April, 2010 to Jan, 2011 Mean 242
Guwahati II July, 2013-30 June, 2014 Annual mean 90.7±59.7 Tiwari et al., 2017
Raipur II Oct, 2008 to Sept, 2009 Annual mean 387.29±76.85 Deshmukh et al., 2013
Mangalore II Jan, 2013-Oct, 2016 Mean 101.8 Kalaiarasan et al., 2018
Simla II 2005-2012 Mean 93.9±58.7 Dholakia et al., 2014
Amritsar II 9 Nov-15 Nov, 2016 Winter mean 252.22±108.14 Ravindra et al., 2019
Rourkella II Jan, 2011-Dec, 2011 Mean (Four seasons) 127.755 Kavuri, 2013
Dhanbad II Mar, 2014-Feb, 2015 Mean (Summer, post monsoon & Winter) 216±82 Jena and Singh, 2017
Lucknow II Mar-June, 2012 Summer mean 123±13 Lawrence, and Fatima, 2014
Kanpur II Oct, 2002-Feb, 2003 Mean 80 Sharma and Mallo, 2005
Oct, 2002-Feb, 2003 Mean 277±117.61
Chandigarh II 27 Oct-3 Nov, 2016 Winter mean 151.45±106.40 Ravindra et al., 2019
Fatehgarh Sahib III 3 Nov-9 Nov, 2016 Winter mean 197.07±61.35 Ravindra et al., 2019
Bathinda III 16 Nov-21 Nov, 2016 Winter mean 204.04±70.80 Ravindra et al., 2019
Sirsa III 21 Nov-26 Nov, 2016 Winter mean 203.12±83.28 Ravindra et al., 2019
Rohtak III 26 Nov-3 Dec, 2016 Winter mean 186.09±78.33 Ravindra et al., 2019
Sonipat III 3 Dec-6 Dec, 2016 Winter mean 213.67±151.49 Ravindra et al., 2019
Jharia III Mar, 2011-Feb, 2012 Mean 333.7±17.86 Roy et al., 2019
Udaypur III July, 2017-June, 2018 Mean 128.34 Yadav et al., 2019
Adityapur III 1 July, 2013-30 June, 2014 Mean 165±43.93 Shubhankar and Ambade, 2016

The tier-II cities are also not lagging far behind the India’s tier-I cities in terms of PM10 pollution. Raipur had mean PM10 concentrations of 387.29±76.9 μg m-3 during October 2008 to September 2009 while another city Kanpur recorded mean PM10 concentrations of 277± 117.6 μg m-3 during October 2002 to February 2003 (Deshmukh et al., 2011; Sharma and Maloo, 2005). Amongst the tier-III cities, the reported mean PM10 concentrations of some specific cities like Jharia and Sonipat were also on the higher side with 333.7±17.9 μg m-3 and 213.7±151.5 μg m-3 during the period March 2011 to February 2012 and 03 December to 06 December 2016, respectively (Ravindra et al., 2019; Roy et al., 2019).

One option to the Indian policymakers to mitigate critical PM concentrations in the cities, vis a vis health effects, therefore, may be to correctly predict the concentrations at least one to two days in advance and accordingly initiate prior actions such as regulation of traffic in a planned way. However, predicting the air quality is not so straightforward job because of the complex interactions of different nonlinear parameters (Hooyberghs et al., 2005). Shahraiyni and Sodoudi (2016) reviewed 36 research studies executed in different cities of the world in the quest of achieving prediction accuracy in forecasting PM10. In these studies, 50% of researchers employed a multi-layer perceptron (MLP) with Feedforward Backpropagation Network (FFBN) topologies, a class of Artificial Neural Network (ANN) model. Around 28% (10 studies) depended on the widely used Multiple Linear Regression (MLR) technique for PM10 forecasting in urban areas. Three studies (about 8%) used the Radial Basis Function (RBF) network of ANN class to forecast city-level PM10. The other five studies (14%) depended on different other techniques like PNN (Pruned Neural Networks), LL (Lazy Learning), MLP and MLR combo, Elman class of Recurrent Neural Networks (RNN), and PCRA (Principal Component Regression Analysis). ANN technique appears to be providing useful results to deal with nonlinear independent variables involved in environmental pollution prediction. Hence, more practitioners resort to ANN modeling type of data-driven approaches as alternatives to traditional deterministic or nonlinear models (Cabaneros et al., 2019; Jiang et al., 2017). Pollution researchers of China and elsewhere have used ANN techniques extensively to forecast airborne PM concentrations in the past. The use of MLR with stepwise inclusion of input variables has been the most used tool for temporal prediction of PM2.5 and PM10 in different urban areas of India. MLR has its limitation in terms of the linear representation of nonlinear systems. However, researchers have, in a limited way only, showed a preference for different data-driven predictive techniques for PM forecasting in the Indian context and comparatively judge their performances (Table S2).

Against the above background, this paper’s primary objective is to assess the predictive ability of three contemporary statistical techniques namely MLR, ANN, and CART (Classification and Regression Tree) analyses for one day ahead PM10 concentration prediction of an Indian city. The best-performed technique will be a useful tool for city authorities and air quality managers for initiating in situ measures to curb pollution. Unlike previous modeling efforts (Table S2), this is the first instance concerning applying CART analysis as a statistical procedure for the prediction of PM10 in a comparative set up of an Indian city. In the recent past, Gocheva-Ilieva and Stoimenova (2018) employed CART in predicting PM10 for the Pleven city of Bulgaria and claimed very accurate model performance. The CART technique as a method for analysis and forecasting of PM10 claimed to have performed better than MLR (Slini et al., 2006).


2. LOCATION OF THE STUDY

The model development for forecasting PM10 was attempted in the north-eastern Indian tier-II city of Guwahati, capital city of the state of Assam, India. For the last 10-12 years, Guwahati has been recognized as one of India’s most rapidly growing cities. Rapid urbanization and its contribution to air pollution have made smaller Indian cities like Guwahati vulnerable too. Vehicular growth (both light and heavy vehicles) in the city was notable in the past decade, with about a reported sharp rise of 87%. A recent study conducted in Guwahati, computed Hazard Quotient (HQ) based on NAAQS and WHO, indicated quite a high degree of health risk for the city dwellers (Dutta and Jinsart, 2020). There is black carbon pollution in the city air due to rapid urbanization and poor environmental quality control (Barman and Gokhale, 2019). Guwahati has a humid subtropical climate. The four major seasons of the city are winter (December to February), spring (March to May), summer ( June to August), and autumn (September to November), with differing meteorological conditions. Guwahati has six ambient air monitoring stations, set up under the National Air Quality Monitoring Programme (NAMP), to measure key pollutants (Pant et al., 2019). Only one of the NAMP stations can measure PM2.5, while the newly developed CAAQM (Continuous Ambient Air Quality Monitoring) station started functioning only during mid of 2019. The six NAMP stations’ location and their monitoring type in the backdrop of Guwahati city can be seen in Fig. 1 and Table S3, respectively below.


Fig. 1. 
Study location and monitoring stations.


3. METHODS

Daily average concentration data (1096 data points) for PM10 (μg m-3), CO (ppm), NO2 (ppb), and SO2 (ppb) were collected in respect of all the six air quality monitoring stations for three years 2016-2018 from State Pollution Control Board (SPCB) office located at Guwahati. The three years (2016-2018) daily climate data (1096 data points) for ambient temperature (AT, °C), relative humidity (RH, %), wind speed (WS, ms-1), rainfall (RF, millimeter) were acquired from Regional Meteorological Department, located at Guwahati.

3. 1 Data Treatment

A few missing values were observed in respect of daily average concentration data for PM10, CO, NO2, and SO2 for the 2016-2018 time-series data. As the observed values vary significantly, those few days were removed from the data set instead of the linear interpolation technique. The modified data set contained 1092 observations. Climate data (1096 data points) had no missing value but adjusted to have parity with pollutant data by removing the corresponding values.

3. 2 Descriptive Statistics and Analysis of Time Series

Descriptive statistcs of the climate data, PM10, and gaseous pollutants for the period 2016-2018 (1092 data points) and time series analysis were also worked out in respect of air quality monitoring station 6 to understand the characteristics and correlation of different variables throughout the study. Station 6 was found to be a representative one out of six air quality monitoring stations of the city due to reasons like the completeness of data sets and common refection of land-use patterns of the city. Multiple time series charts were produced with time on the horizontal axis and PM10 concentrations, climate variables, and gaseous variables (AT, RH, RF, WS, SO2, CO, NO2) on the vertical coordinate axes.

3. 3 Predictive Models Development and Validation

We have used MLR analysis, MLP class of ANN, and CART for forecasting of one day ahead PM10 concentration for all the six air quality monitoring stations of Guwahati city.

3. 3. 1 Multiple Linear Regression (MLR)

In MLR analysis, the mathematical model was built up to forecast the dependent variable, i.e., next day PM10 based on the inputs of independent variables comprising of climate variables and gaseous elements. In MLR, the coefficient of determination (R2) indicates the overall capability of the model to handle variance in data. The regression model was composed following equation 1 (Abdullah et al., 2019; Vlachogianni et al., 2011).

Yi=β0+β1X1i+β2X2i++βnXni+εi(1) 

where Y is the dependent variable, βi is the regression coefficients, Xi is the independent variables and ε is a stochastic error associated with the regression. This relationship was used in this study to develop a mathematical equation model to predict the next day PM10 concentrations of the six ambient air monitoring stations of Guwahati with input variables like meteorological parameters, PM10, and gaseous pollutants. MLR assumes that the residuals have a normal distribution with a zero mean, uncorrelated and constant variance. The stepwise multiple linear regression procedure was used here to derive the mathematical equation (Abdullah et al., 2019). Variance inflation (VIF) was used in this study to evaluate the multicollinearity effect on the variance of the estimated regression coefficient. The equation for VIF (Equation 2) is as follows:

VIF=11-R2(2) 
3. 3. 2 Multi-Layer Perceptron (MLP) Model

ANN is a robust data modeling technique capable of handling the nonlinear relationship between variables and hence found suitable for the prediction of PM10 which requires exploration of the complex relationship between particulate matters, meteorological variables, and gaseous pollutants present in the atmosphere (Feng et al., 2015). We have used MLP in this study to create predictive models for each of six ambient monitoring stations of Guwahati using nonlinear combinations of the input variables (meteorological parameters, PM10, PM2.5, and gaseous pollutants) to predict the next day PM10 concentrations. MLP forms a network of functionally interconnected neurons, also known as perceptron (Vemuri, 1988). ANN scores more than MLR because of its ability to predict the dependent variable of a builtup model more accurately (Gardner and Dorling, 1998). MLP has a simple structure consisting of three layers: the input layer, hidden layer, and output layer. One hidden layer was considered in our study, as it was suggested to be sufficient to achieve the optimum model capacity (Bishop, 1995). The number of neurons or the nodes, in the input layer, was equal to the number of input variables introduced in the model. The relevant input variables, i.e., observed meteorological parameters, PM10, and gaseous pollutants, are fed in the model as signals to the input layer of the model, which is then passed on to the hidden layer. The neurons do the computations to detect features of the input variables and introduce them to the input layer with requisite weights. The weights are assigned to input variables based on their relative importance. The hidden layer does the critical function of nonlinear transformations of the inputs entered the network through a predefined activation function. The neuron sums up information, including bias, in the hidden layer. The bias does the job of providing a trainable constant value to every neuron in addition to its normal value. The mathematical formulation of the MLP model is as shown below in equation 3:

Y=Fj=1mWkjFi=1nWjiXi+Bj+Bk(3) 

where Y=output, F=transfer function, Wkj.=weights between hidden and output layers, Wji=weights between input and hidden layers, Xi=input variables, m=number of neurons in a hidden layer, n=number of neurons in an input layer, Bj=bias values of the neurons in the hidden, and Bk=bias values of the neurons in the output layers. Fig. 2 depicts the basic structure of the MLP framework.


Fig. 2. 
The architecture of the MLP network.

3. 3. 3 Classification and Regression Trees (CART)

CART is a non-parametric regression technique that can be employed for the prediction of an independent variable when the distribution of independents variables is not known. Typically, therefore, the CART method tries to ascertain the distribution pattern of the outcome (dependent) variable using the independent variables through their linear or nonlinear relationship with the outcome variable. CART builds up a decision tree through a hierarchy of binary decisions. Each binary decision will involve splitting a target variable into two alternative and mutually exclusive branches (groups) depending upon the variation/values of the explanatory variable leading to the most considerable possible reduction in post-split variations/values of the target variable. In other words, splitting stops when there is no additional gain by further splitting can be achieved (Mckenney and Pedlar, 2003; Moisen and Frescino, 2002). Predictive CART models have been built up in this study for each of the ambient air quality monitoring stations with observed independent predictor variables like meteorological parameters, PM10, and gaseous pollutants of the respective stations to predict the respective dependent variables i.e., next day PM10 concentrations of the city.

3. 3. 4 Model Validation

MLR, MLP, and CART equations have been validated by computing net absolute error (NAE), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), index of agreement (IA), coefficient of determination (R2) (Grzesiak and Zaborski, 2012; Jinsart et al., 2010; Willmott et al., 2009). Table 2 provides the performance indicators for model validation.

Table 2. 
Performance indicators for model validation.
Sl.
No.
Performance indicators Equations
1 Net absolute error NAE=i=1nPi-Oii=1nOi
2 Mean absolute error MAE=1ni-1nPi-Oi
3 Mean square error MSE=i=0nPi-Oi2n
4 Root mean square error RMSE=i=1nPi-Oi2n
5 Index of agreement IA=1-i=1nPi-Oi2i=1nPi-O¯+Oi-O¯2
6 Coefficient of determination R2=1-i=1nOi-Pi2i=1nOi-O¯2

SPSS 25 has been used for computation of MLR and MLP while computation for CART SPSS modeler 18 has been used in this study.


4. RESULTS AND DISCUSSION
4. 1 Descriptive Statistics and PM10 Concentration of Guwahati
4. 1. 1 Descriptive Statistics

The mean values and standard deviations of the meteorological parameters, PM10, and gaseous pollutants of the respective air quality monitoring stations of the city under consideration are provided in Table 3. High variability was observed in the PM10 level. During 2016- 2018 the daily average PM10 concentration varied. Across the six air quality monitoring stations, the maximum and minimum mean PM10 concentration was 133.32 μg m-3 and 51.41 μg m-3, respectively. The highest daily average PM10 recorded was 259.39 μg m-3, while the lowest was 40.67 μg m-3 during the period 2016- 2018. The average RH level of the city was found to be on the higher side while wind speed on the lower side. The time-series data reveals the maximum temperature of 34°C recorded during the summer season while the minimum was 14°C during the winter season. Guwahati received rainfall due to the southwest monsoon and the highest rainfall occurred from June to August.

Table 3. 
Descriptive statistics.
Parameters Monitoring Stations: Mean and standard deviation
S1_603
Mean
SD S2_596
Mean
SD S3_519
Mean
SD S4_541
Mean
SD S5_602
Mean
SD S6_193
Mean
SD
RF 4.11 11.67 4.11 11.67 4.11 11.67 4.11 11.67 4.11 11.67 4.11 11.67
Temp 25.33 4.40 25.33 4.40 25.33 4.40 25.33 4.40 25.33 4.40 25.33 4.40
Dew PT 21.27 4.94 21.27 4.94 21.26 4.94 21.26 4.94 21.27 4.94 21.26 4.94
RH 78.91 9.74 78.91 9.74 78.91 9.74 78.91 9.74 78.91 9.74 78.91 9.74
WS 1.33 0.90 1.33 0.90 1.33 0.90 1.33 0.90 1.33 0.90 1.33 0.90
PM10 104.57 57.28 108.62 57.10 133.22 66.99 105.97 51.41 108.62 57.10 109.82 51.41
NO2 17.46 4.96 21.66 5.45 17.01 4.06 14.19 5.08 21.66 5.45 16.75 5.08
SO2 7.62 2.42 7.55 2.65 7.52 2.27 5.63 2.27 7.55 2.65 7.61 2.27
CO 1.64 0.72 3.25 2.38 2.85 0.71 2.88 0.71 3.25 2.38 3.43 0.72

4. 1. 2 Correlation of PM10 Concentration, Climate Variables, and Gaseous Variables

In Fig. 3(A)-3(G), the time series of the observed meteorological parameters, PM10, and gaseous pollutants are reported in respect of air quality monitoring station 6 of the Guwahati city. It can be observed from Fig. 3(A) that the site is characterized by relatively high humidity throughout the year. The time series considered in this study shows that the concentration of PM10 has maintained almost a negative correlation with relative humidity. PM10 concentration behavior of the city shows a pattern of annual cycle with high concentrations during winter (December to February), possibly due to lower planetary boundary layer height, and a higher level of concentrations seems to continue up to the months of March-April as well, i.e., beyond winter.


Fig. 3. 
The time series 2016-18 measured in Guwahati: (A) PM10 vs RH, (B) PM10 vs SO2, (C) PM10 vs CO, (D) PM10 vs NO2, (E) PM10 vs Temperature, (F) PM10 vs Rainfall, (G) PM10 vs Wind Speed.

Another peculiarity of the site is that CO, NO2, and SO2 have a positive correlation with PM10 concentrations suggesting a common source for these compounds but the correlation with SO2 is stronger as shown in Fig. 3(B)-3(D), and Table S4. Fig. 3(E) indicates negative correlation of PM10 with temperature and very mild positive correlation with rainfall and windspeed as shown in Fig. 3(F) and 3(G), respectively.

4. 2 Multiple Linear Regression Model for PM10 Forecasting

The MLR model summary, developed for all six ambient air quality monitoring stations located at Guwahati, has been placed in Table 4. The range of the Variance Inflation Factor (VIF) for the independent variables of all the six MLR models is found in order as they are below 10, showing the non-existence of multicollinearity issues in the models. Durbin Watson (D-W) statistics show that the models can accommodate the autocorrelation, as the values were in the range of 2.103-2.239. The residual (error) is critical in choosing the robustness of the factual model as linear regression is sensitive to outlier effects. Fig. 4(A)-4(F) shows the histogram plot, which indicates that the residuals are also normally distributed with zero mean and constant variance. Fig. S1(A)-S1(F) show the observation and prediction of the MLR models in scatter plots.

Table 4. 
Summary of the Multiple Linear Regression (MLR) models for PM10 forecasting.
Monitoring station Model R2 Range of VIF D-W statistics
S1_603 PM10, (t+1) concentration= 204.565+0.551 (PM10)-3.509 (Ambient temperature) -0.870 (Relative humidity) 0.628 1.277-1.895 2.216
S2_596 PM10, (t+1) concentration= 222.67+0.492 (PM10)-3.340 (Ambient temperature) -1.131 (Relative humidity)+1.994 (CO) 0.624 0.249-1.925 2.226
S3_519 PM10, (t+1) concentration= 214.67+0.618 (PM10)–3.382 (Ambient temperature) -0.988 (Relative humidity) 0.674 1.330-1.904 2.180
S4_541 PM10, (t+1) concentration= 254.38+0.417 (PM10)-4.934 (Ambient temperature) -0.782 (Relative humidity)-1.026 (SO2) 0.622 1.164-2.183 2.103
S5_602 PM10, (t+1) concentration= 161.91+0.564 (PM10)-2.725 (Ambient temperature) -0.652 (Relative humidity) 0.61 1.204-1.754 2.239
S6_193 PM10, (t+1) concentration= 216.23+0.572 (PM10)-3.667 (Ambient temperature) -0.868 (Relative humidity)-0.915 (SO2) 0.682 1.067-2.082 2.137


Fig. 4. 
Histogram plots (A) station 1, (B) station 2, (C) station 3, (D) station 4, (E) station 5 and (F) station 6.

4. 3 Multi-Layer Perceptron Model

The normalized input variables PM10, RF, T, RH, WS, NO2, SO2, CO of the respective air monitoring stations were fed into the six different ANN models using the normalizing data conversion facility of the ANN module of SPSS software. For ANN training 70% of the data set and testing 30% of the data set were used. The training data set is propagated in the forward phase, through the hidden layer, which comes out through the output layer. The error, i.e., the difference between output values and actual target output values are propagated back toward the hidden layer until the errors are reduced in successive cycles (Ul-Saufie et al., 2013). In the process, the neural network learnt and changed weights during forward and backward phases. We, in this study, engaged different combinations of transfer functions like sigmoid/ hyperbolic tangent, sigmoid/linear, sigmoid/sigmoid, and hyperbolic tangent/linear functions for each of the six monitoring stations to compare and pick up the optimum R2 values as shown in Table 5. The network structure, transfer functions of each of the models, and performance indicators (IA, R2, NAE, MAE, MSE, and RMSE) can be seen in Table 5 below. The optimum R2 values (0.651 for station S1, 0.637 for station S2, 0.688 for station S3, 0.636 for station S4, 0.641 for station S5 and 0.693 for station S6) are also marked ‘bold’ in Table 5. The respective values of the performance indicators like IA, NAE, MAE, MSE, and RMSE, for each of six monitoring stations, against the optimum R2 values can also be seen in Table 5. Fig. S2(A)-S2(F) show the observation and prediction of the ANN models in scatter plots.

Table 5. 
Predictive MLP models with network structure, transfer functions and performance indicators.
Target Stn. Network structure
Input : Neurons : Output
Transfer function:
hidden/output layer
R2 NAE MAE MSE RMSE IA
PM10, (t+1) S1 08 : 07 : 01 Sigmoid/Hyperbolic tangent 0.626 0.15 16.02 497.86 22.31 0.95
Sigmoid/Linear 0.651
Sigmoid/Sigmoid 0.640
Hyperbolic tangent/Linear 0.646
PM10, (t+1) S2 08 : 07 : 01 Sigmoid/Hyperbolic tangent 0.637 0.22 23.80 1200.21 34.64 0.88
Sigmoid/Linear 0.634
Sigmoid/Sigmoid 0.630
Hyperbolic tangent/Linear 0.629
PM10, (t+1) S3 08 : 07 : 01 Sigmoid/Hyperbolic tangent 0.674 0.20 26.11 1408.45 37.53 0.90
Sigmoid/Linear 0.688
Sigmoid/Sigmoid 0.679
Hyperbolic tangent/Linear 0.672
PM10, (t+1) S4 08 : 07 : 01 Sigmoid/Hyperbolic tangent 0.626 0.21 22.57 962.49 31.02 0.88
Sigmoid/Linear 0.621
Sigmoid/Sigmoid 0.627
Hyperbolic tangent/Linear 0.636
PM10, (t+1) S5 08 : 07 : 01 Sigmoid/Hyperbolic tangent 0.635 0.22 23.40 1158.79 34.04 0.88
Sigmoid/Identify 0.630
Sigmoid/Sigmoid 0.623
Hyperbolic tangent/Identify 0.641
PM10, (t+1) S6 08 : 07 : 01 Sigmoid/Hyperbolic tangent 0.693 0.22 21.87 1007.72 31.74 0.90
Sigmoid/Identify 0.687
Sigmoid/Sigmoid 0.686
Hyperbolic tangent/Identify 0.686

4. 4 Predictive CART Model

By using CART analysis, several decision trees were developed based on different combinations of observed meteorological parameters, PM10, and gaseous pollutants for the three years (2016-2018). As typical in machine learning, out of the total data points of the respective independent and dependent variables, 70% used as trained set while 30% as the test set. The optimum models were produced for each of the six air quality monitoring stations of Guwahati when they had the least relative errors in respective cases given by equation 4 below.

Relative error of CART=SKSO(4) 

where S(K) is equal to the sum of the squared residuals at the terminal node and S(O) is the sum of squared errors of the dependent error around its mean in the root node. The predictive CART models and performance indicators (like R2, IA, NAE, MAE, MSE, and RMSE) are given in Table 6. Fig. S3(A)-S3(F) show the decision trees of the CART models.

Table 6. 
Predictive CART models with input variable PM10, RF, T, RH, WS, NO2, SO2, CO and performance indicators.
Target Stn. Set R2 NAE MAE MSE RMSE IA
PM10, (t+1) S1 Training 0.660 0.22 23.10 1132.68 33.66 0.89
Testing 0.614 0.26 26.39 1292 35.94 0.88
PM10, (t+1) S2 Training 0.519 0.24 26.34 1598.04 39.98 0.82
Testing 0.568 0.24 25.95 1357.10 36.84 0.85
PM10, (t+1) S3 Training 0.637 0.21 28.47 1609.12 40.11 0.88
Test 0.625 0.21 27.35 1700.95 41.24 0.88
PM10, (t+1) S4 Training 0.606 0.22 23.58 1011.40 31.80 0.87
Test 0.522 0.26 27.05 1339.12 36.59 0.84
PM10, (t+1) S5 Training 0.628 0.22 20.59 824.04 28.71 0.88
Test 0.577 0.24 21.97 922.12 30.37 0.85
PM10, (t+1) S6 Training 0.656 0.21 23.03 1066.67 32.66 0.89
Test 0.628 0.21 23.13 1282.33 35.81 0.88

4.5 Model Comparison

All six performance indicators were put to use for comparing the one-day ahead PM10 prediction performances of three methods, i.e., MLR, ANN (MLP), and CART to isolate the best model, as shown in Table 7. NAE, MAE, MSE, and RMSE were used to find the error of the model, where a value closer to 0 indicated a better model. The other two performance indicators, namely, IA and R2, were used to check the accuracy of the model result, where higher accuracy is given by a value closer to 1. The values for performance indicators provide specific information regarding predictive performance efficiencies (Singh et al., 2013). RMSE wise comparison between models is best desired when the objective is to avoid large prediction errors.

Table 7. 
PM10 prediction model performance statistics: NAE MAE, MSE, RMSE IA, and R2 between measured and estimated values for six air quality monitoring stations.
Station 1 Station 2
MLR MLP CART MLR MLP CART
NAE 0.23 0.15 0.26 0.22 0.22 0.24
MAE 24.37 16.00 26.39 24.02 23.80 25.95
MSE 1219.39 497.86 1292 1226.49 1200.21 1357
RMSE 34.92 22.31 35.94 35.02 34.64 36.84
R2 0.63 0.65 0.61 0.62 0.64 0.57
IA 0.87 0.95 0.88 0.87 0.88 0.85
Station 3 Station 4
MLR MLP CART MLR MLP CART
NAE 0.20 0.20 0.21 0.22 0.21 0.26
MAE 26.26 26.11 27.35 23.05 22.57 27.05
MSE 1461.48 1408.45 1700.95 998.95 962.49 1339.12
RMSE 38.23 37.53 41.24 31.61 31.02 36.59
R2 0.67 0.69 0.63 0.62 0.64 0.52
IA 0.89 0.90 0.88 0.87 0.88 0.84
Station 5 Station 6
MLR MLP CART MLR MLP CART
NAE 0.23 0.22 0.24 0.20 0.22 0.21
MAE 21.37 23.40 21.97 22.20 21.84 23.13
MSE 859.33 1158.79 922.12 1023.64 1007.12 1282.33
RMSE 29.31 34.04 30.37 31.99 31.74 35.81
R2 0.61 0.64 0.58 0.68 0.69 0.63
IA 0.87 0.88 0.85 0.90 0.90 0.88

On the other hand, MAE casts light on the average magnitude of the error without considering their direction. The advantage of the linear score of MAE lies in the fact that all individual differences between predictions and corresponding observed values are given equal weight in the average. However, amongst all six performance indicators, R2 can be regarded as the single most important measure in deciding the prediction accuracy (Yoo et al., 2018).

In this study, the prediction of one day ahead PM10 for all the six air quality monitoring stations displayed relatively good fits through the use of MLP methods (R2=0.64-0.69; IA=0.88-0.95) and smallest errors (NAE=0.15-0.22; MAE=16-26.11; MSE=497.86- 1408.45; and RMSE=22.31-37.53 in comparison to its closest performer MLR method (R2=0.61-0.68; IA=0.87-0.90; NAE =0.20-0.23; MAE =21.37- 26.26; MSE=859.23-1461.48; and RMSE=29.31- 38.23). It can be seen from Table 7 that CART as predictive method for one day ahead PM10 were close to MLR but not equal in terms of model evaluation indicators with R2=0.52-0.63; IA=0.84-0.88; NAE=0.21- 0.26; MAE=21.97-27.35; MSE=922.12-1700.95; and RMSE=30.37-41.24 in the test set results as clearly revealed in Table 7.

The accuracy measures range of R2 (0.64-0.69) and IA (0.88-0.95), in combination, for ANN (MLP) models are providing best correlations between predicted and observed PM10 concentrations during the three years 2016-2018 when compared with the MLR models (R2: 0.61-0.68, IA: 0.87-0.90) and CART model (R2: 0.52-0.63, IA: 0.84-0.88). Again, in terms of prediction error of the models, ANN (MLP) providing the least model error possibilities (NAE: 0.15, MAE: 16, RMSE: 22.31 and MSE: 497.86) than MLR (NAE: 0.21, MAE: 21.97, RMSE: 29.31 and MSE: 859.33) and CART (NAE: 0.21, MAE: 21.97, RMSE: 30.37; MSE: 922.12). Hence, the results obtained from the ANN (MLP) models were more suitable for Guwahati than those of the constructed MLR and CART models.


5. CONCLUSIONS AND RECOMMENDATIONS

The comprehensive and comparative review of PM10 concentration status of 28 different categories of Indian cities (tier-I, tier-II, and tier-III cities) and alarming levels of PM10 concentrations thereof indicate the urgent need to improve city-level air quality. Kolkata, a tier-I city in India, even clocked PM10 concentration of as high as 445±21 μg m-3 during the wintertime. The tier-III cities like Raipur and Kanpur were found to be not lagging far behind the tier-I cities in terms of ambient PM10 concentration. Interestingly, tier-III cities like Jharia and Sonipat were also recoded PM10 concentration as high as 333.7±17.86 μg m-3 and 213.67±151.49 μg m-3 respectively. The PM10 concentrations level in all the 28 indian cities grossly violated both WHO and NAAQS standards by a wide margin. Kolkata topped the list with 22.25 times more than the WHO standard and 7.42 times NAAQS followed by Bengaluru (17.49 times WHO standard and 5.83 times NAAQS), and Delhi (11.1 times WHO standard and 3.7 times NAAQS). Therefore, it is high time for the initiation of some requisite actions for diminishing or preventing the build-up of the high ambient PM10 concentration level in the cities. One way out is abatement action through short term traffic reduction in cities based on predicted PM10 concentration level in advance. Therefore, it entails correct prediction of the city level PM10 concentrations at least one or two days in advance by the local air quality managers through analysis of data routinely gathered by city authorities and predictive modeling thereof.

The tier-II city Guwahati recorded high variability in the observed in PM10 level due to the rapid urbanization. The highest daily average PM10 recorded was 259.39 μg m-3, while the lowest was 40.67 μg m-3 during the period 2016-2018. The mean PM10 concentration for the city of 133.22 μg m-3, as found in this study, violated WHO standard by 6.66 times and NAAQS by 2.22 times which were 4.54 times and 1.51 times respectively, during 2013-2014 (Table 1, Table S1). The average daily NO2, CO, and SO2 concentrations of Guwahati were found to be in correlation with PM10 concentrations during 2016-2018 and thereby suggesting a common source for these compounds.

In different cities of the world, different predictive modeling techniques have been used to predict PM10 in advance. However, the use of MLR with stepwise inclusion of input variable was found to be the most widely used tool for temporal prediction of PM10 in different urban areas of India, and that too mostly applied in bigger cities of the country (Table S2). This study found that the next day’s PM10 concentrations, in a tier-II city Guwahati, can be better forecasted using non-linear algorithm MLP with FFBN topologies of ANN class in comparison to linear MLR and non-linear CART model. These three models were critically assessed through a comparative evaluation of performance indicators keeping in mind the end goal is to choose the best-fitted model for accurate forecasting PM10 at the city level. The result of the study reveals that the one day ahead PM10 for all the six-air quality monitoring stations of Guwahati, prediction ability has been relatively better using MLP methods (R2=0. 0.64-0.69; IA=0.88-0.95) and with smallest errors (NAE=0.15-0.22; MAE=16-26.11; MSE=497.86-1408.45; and RMSE=22.31-37.53 in comparison to its closest performer MLR method (R2=0.61-0.68; IA=0.87-0.90; NAE=0.20-0.23; MAE=21.37-26.26; MSE=859.23-1461.48; and RMSE=29.31-38.23). It is interesting to note that CART as predictive method for one day ahead PM10 were close to MLR but not equal in terms of model evaluation indicators with R2=0.52-0.63; IA=0.84-0.88; NAE =0.21-0.26; MAE =21.97-27.35; MSE = 922.12-1700.95; and RMSE=30.37-41.24. The relatively low R2 value is quite common in the case of time series dependent nonlinear atmospheric variables with their known confounding effects.

An attempt was made to further validate the predictive performance of the MLP model with respect to the observed PM10 data of the (STN6_193) NAMP monitoring station of Guwahati beyond the period of collected data ( January-March, 2019) used in developing the MLP model. The predicted PM10 concentrations obtained using MLP model have been matched with the same period’s actual data. Fig. S4 shows that the MLP model performed well for the post-study period as well and the model performance indicators (R2=0.69, IA=0.89, NAE=0.07, MAE=13.02, MSE=287.95 and RMSE=16.97) were also in line with the original model (Table S5).

In the backdrop of CPCB’s acknowledgment that comparatively smaller tier-II cities are also facing severe air pollution, city authorities are contemplating initiating several steps for curtailing air pollution and health hazards thereof. We recommend the local authority to use the non-linear algorithm MLP (ANN) with FFBN topologies for forecasting PM10 concentration in the smaller Indian cities like Guwahati too for avoiding PMinduced health hazards to a great extent. ‘Predict pollution and defeat concentration’ could be another approach to fight the air pollution menace in addition to the odd-even rule, which few Indian cities are enforcing presently to rein on air pollution through curtailment of vehicular pollution. The advance prediction approach seems to be more applicable to Guwahati city as this study found PM10 concentration built up had a positive correlation with gaseous pollutants and hence likely to have a common source, i.e., vehicular pollution. Moreover, with this model, the local SPCB authorities can caution city dwellers of impending dangerous levels of PM10, so that they can lessen their outdoor activities for those days and thereby avoiding exposure to unhealthy levels of air quality.


Notes
* Indian government classification of cities based on their population as tier-I, tier-II, and tier-III.

Acknowledgments

This study was supported by the Graduate School Thesis Grant, Chulalongkorn University, Bangkok, Thailand. The authors also thank the State Pollution Control Board, Assam, and Regional Meteorological Department, Guwahati, for air pollution and meteorological information, respectively.


References
1. Abdullah, S., Ismail, M., Ahmed, A.N., Abdullah, A.M. (2019) Forecasting Particulate Matter Concentration Using Linear and Non-Linear Approaches for Air Quality Decision Support. Atmosphere, 10, 667.
2. Agarwala, S., Sharma, S., Suresh, R., Rahman, M.H., Vranckx, S., Maiheu, B., Blyth, L., Janssen, S., Gargava, P., Shukla, V.K., Batra, S. (2020) Air quality forecasting using artificial neural networks with real time dynamic error correction in highly polluted regions. Science of the Total Environment, 735, 139-454.
3. Apte, J.S., Marshall, J.D., Cohen, A.J., Brauer, M. (2015) Addressing global mortality from ambient PM2.5. Environmental Science and Technology, 49(13), 8057-8066.
4. Barman, N., Gokhale, S. (2019) Urban black carbon-source apportionment, emissions and long-range transport over the Brahmaputra River Valley. Science of the Total Environment, 693, 133577.
5. Bhardwaj, R., Pruthi, D. (2020) Evolutionary techniques for optimizing air quality model. Procedia Computer Science, 167, 1872-1879.
6. Bishop, C.M. (1995) Neural Networks for Pattern Recognition, Oxford Univ. Press: Oxford, NY, USA, 1995; ISBN 978-0-19-853864-6.
7. Cabaneros, S.M., Calautit, J.K., Hughes, B.R. (2019) A review of artificial neural network models for ambient air pollution prediction. Environmental Modelling and Software, 119, 285-304.
8. Carnevale, C., Pisoni, E., Volta, M. (2010) A non-linear analysis to detect the origin of PM10 concentrations in Northern Italy. Science of the Total Environment, 409(1), 182-191.
9. Chelani, A.B., Gajghate, D.G., Hasan, M.Z. (2002) Prediction of ambient PM10 and toxic metals using artificial neural networks. Journal of the Air and Waste Management Association, 52(7), 805-810.
10. Chen, K., Glonek, G., Hansen, A., Williams, S., Tuke, J., Salter, A., Bi, P. (2016) The effects of air pollution on asthma hospital admissions in Adelaide, South Australia, 2003-2013: time-series and case-crossover analyses. Clinical and Experimental Allergy, 46(11), 1416-1430.
11. CPCB (2016) Central Pollution Control Board, Delhi. July, 2016. Available online: https://www.cpcb.nic.in/openpdffile.php?id=TGF0ZXN0RmlsZS9MYXRlc3RfMTIzX1NVTU1BUllfQk9PS19GUy5wZGY=[AQ5] (accessed on 8 January 2020).
12. Czernecki, B., Półrolniczak, M., Kolendowicz, L., Maros, M., Kendzierski, S., Pilguj, N. (2017) Influence of the atmospheric conditions on PM10 concentrations in Poznań, Poland. Journal of Atmospheric Chemistry, 74(1), 115-139.
13. Das, R., Khezri, B., Srivastava, B., Datta, S., Sikdar, P.K., Webster, R.D. (2015) Trace Element Composition of PM2.5 and PM10 from Kolkata - A Heavily Polluted Indian Metropolis. Atmospheric Pollution Research, 6(5), 742-747.
14. De, S. (2019) Long-term ambient air pollution exposure and respiratory impedance in children: A cross-sectional study. Respiratory Medicine, 170, 105795.
15. Deshmukh, D.K., Deb, M.K., Tsai, Y.I., Mkoma, S.L. (2011) Water Soluble Ions in PM2.5 and PM1 Aerosols in Durg City, Chhattisgarh, India. Aerosol and Air Quality Research, 11, 696-708.
16. Deshmukh, D.K., Deb, M.K., Mkoma, S.L. (2013) Size distribution and seasonal variation of size-segregated particulate matter in the ambient air of Raipur city, India. Air Quality Atmosphere and Health, 6, 259-276.
17. Dholakia, H.H., Bhadra, D., Garg, A. (2014) Short term association between ambient air pollution and mortality and modification by temperature in five Indian cities. Atmospheric Environment, 99, 168-174.
18. Dutta, A., Dutta, G. (2018) Indian Growth Story of Automobile Sector and Atmospheric Emission Projection. Pollution Research, 37(1), 131-143.
19. Dutta, A., Jinsart, W. (2020) Risks to health from ambient particulate matter (PM2.5) to the residents of Guwahati city, India: An analysis of prediction model. Human and Ecological Risk Assessment: An International Journal.
20. Feng, X., Li, Q., Zhu, Y., Hou, J., Jin, L., Wang, J. (2015) Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, 107, 118-128.
21. Ferreira, T.M., Forti, M.C., de Freitas, C.U., Nascimento, F.P., Junger, W.L., Gouveia, N. (2016) Effects of particulate matter and its chemical constituents on elderly hospital admissions due to circulatory and respiratory diseases. International Journal of Environmental Research and Public Health, 13(10), 947.
22. Gardner, M.W., Dorling, S.R. (1998) Artificial neural networks (the multilayer perceptron)- a review of applications in the atmospheric sciences. Atmospheric Environment, 32(14- 15), 2627-2636.
23. Gawhane, R.D., Rao, P.S.P., Budhavant, K., Meshram, D.C., Safai, P.D. (2019) Anthropogenic fine aerosols dominate over the Pune region, Southwest India. Meteorology and Atmospheric Physics, 131, 1497-1508.
24. Gocheva-Ilieva, S.G., Stoimenova, M.P. (2018) PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria. World Academy of Science, Engineering and Technology. International Journal of Environmental and Ecological Engineering, 12(9), 572-577.
25. Gogikar, P., Tyagi, B., Gorai, A.K. (2019) Seasonal prediction of particulate matter over the steel city of India using neural network models. Modeling Earth System and Environment, 5, 227-243.
26. Goyal, P., Chan, A.T., Jaiswal, N. (2006) Statistical models for the prediction of respirable suspended particulate matter in urban cities. Atmospheric Environment, 40(11), 2068-2077.
27. Grzesiak, W., Zaborski, D. (2012) Examples of the use of data mining methods in animal breeding. Data mining applications in engineering and medicine. Adem Karahoca, IntechOpen, Croatia. 2012; pp. 303-324. Available online: https://www.intechopen.com/books/data-mining-applications-in-engineering-andmedicine/examples-of-the-use-ofdata-mining-methods-in-animal-breeding (accessed on 21 July, 2020).
28. Gummeneni, S., Yusup, Y.B., Chavali, M., Samadi, S.Z. (2011) Source apportionment of particulate matter in the ambient air of Hyderabad city, India. Atmospheric Research, 101(3), 752-764.
29. Gurjar, B.R., Jain, A., Sharma, A., Agarwal, A., Gupta, P., Nagpure, A.S., Lelieveld, J. (2010) Human health risks in megacities due to air pollution. Atmospheric Environment, 44(36), 4606-4613.
30. Guttikunda, S.K. (2017) Clearing the Air Seminar Series, ‘Filling the Knowledge Gap on Air Quality in Indian Cities’ Initiative on Climate, Energy and Environment (ICEE) at the Centre for Policy Research (CPR). Delhi, 4 December 2017.
31. Guttikunda, S.K., Nishadh, K.A., Gota, S., Singh, P., Chanda, A., Jawahar, P., Asundi, J. (2019) Air quality, emissions, and source contributions analysis for the Greater Bengaluru region of India. Atmospheric Pollution Research, 10(3), 941-953.
32. Hooyberghs, J., Mensink, C., Dumont, G., Fierens, F., Brasseur, O. (2005) A neural network forecast for daily average PM10 concentrations in Belgium. Atmospheric Environment, 39(18), 3279-3289.
33. Jena, S., Singh, G. (2017) Human health risk assessment of airborne trace elements in Dhanbad, India. Atmospheric Pollution Research, 8(3), 490-502.
34. Jiang, P., Dong, Q., Li, P. (2017) A novel hybrid strategy for PM2.5 concentration analysis and prediction. Journal of Environmental Management, 196, 443-457.
35. Jinsart, W., Sripraparkorn, C., Siems, S.T., Hurley, P.J., Thepanondh, S. (2010) Application of the air pollution model (TAPM) to the urban air shed of Bangkok, Thailand. International Journal of Environment and Pollution (IJEP), 42(1/2/3), 68-84.
36. Kalaiarasan, G., Balakrishnan, R.M., Sethunath, N.A., Manoharan, S. (2018) Source apportionment studies on particulate matter (PM10 and PM2.5) in ambient air of urban Mangalore, India. Journal of Environmental Management, 217, 815-824.
37. Kavuri, N.C., Paul, K.K. (2013) Chemical Characterization of Ambient PM10 Aerosol in a Steel City, Rourkela, India. Research Journal of Recent Sciences, 2(1), 32-38.
38. Kaur, M., Mandal, A. (2020) PM2.5 Concentration Forecasting using Neural Networks for Hotspots of Delhi, 2020. International Conference on Contemporary Computing and Applications (IC3A), Lucknow, India, 5-7 February, pp. 40-43.
39. Kottur, S.V., Mantha, S.S. (2015) An integrated model using artificial neural network (ANN) and kriging for forecasting air pollutants using meteorological data. International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), 4(1), 146-152.
40. Kumari, P.R., Avisetty, R.V.S.D.S.P., Akkala, P., Subash, K.V.V., Manideep, K.S., Bojja, P., Aruna, B. (2019) Prediction and Estimation of PM10 and SO2 Concentrations in the Ambient Air At Vijayawada Station using Artificial Neural Networks Computing. International Journal of Recent Technology and Engineering, 7(6C2), 790-793.
41. Lawrence, A., Fatima, N. (2014) Urban air pollution & its assessment in Lucknow City - The second largest city of North India. Science of the Total Environment, 488-489, 447-455.
42. Masood, A., Ahmad, K. (2020) A model for particulate matter (PM2.5) prediction for Delhi based on machine learning approaches. Procedia Computer Science, 167, 2101-2110.
43. Mckenney, D.W., Pedlar, J.H. (2003) Spatial models of site index based on climate and soil 701 properties for two boreal tree species in Ontario, Canada. Forest Ecology and Management, 175, 497-507.
44. Mishra, D., Goyal, P., Upadhyay, A. (2015) Artificial intelligence- based approach to forecast PM2.5 during haze episodes: A case study of Delhi, India. Atmospheric Environ ment, 102, 239-248.
45. Moisen, G.G., Frescino, T.S. (2002) Comparing five modelling techniques for predicting forest characteristics. Ecological Modelling, 157(2-3), 209-225.
46. Murari, V., Kumar, M., Barman, S.C., Banerjee, T. (2015) Temporal variability of MODIS aerosol optical depth and chemical characterization of airborne particulates in Varanasi, India. Environmental Science and Pollution Research, 22, 1329-1343.
47. Myllyvirta, L., Dahiya, S., Sivalingam, N. (2016) Out of sight: how coal burning advances India’s air pollution crisis. Greenpeace Environment Trust, Bengaluru; Available online: http://www.greenpeace.org/india/Global/india/cleanairnation/Reports/Out%20of%20Sight.pdf (accessed on 26, February 2020).
48. Nadeem, I., Ilyas, A.M., Uduman, P.S.S. (2020) Analyzing and Forecasting Ambient Air Quality Of Chennai City In India. Geography Environment Sustainability, 13(3).
49. Nagendra, S.M.S., Khare, M. (2006) Artificial neural network approach for modelling nitrogen dioxide dispersion from vehicular exhaust emissions. Ecological Modelling, 190(1- 2), 99-115.
50. Ostro, B., Chestnut, L., Vichit-Vadakan, N., Laixuthai, A. (1999) The impact of particulate matter on daily mortality in Bangkok, Thailand. Journal of the Air and Waste Management Association, 49(9), 100-107.
51. Pant, P., Lal, R.M., Guttikunda, S.K., Russell, A.G., Nagpure, A.S., Ramaswami, A., Peltier, R.E. (2019) Monitoring particulate matter in India: recent trends and future outlook. Air Quality Atmosphere and Health, 12(1), 45-58.
52. Pipal, A.S., Jan, R., Satsangi, P., Tiwari, S., Taneja, A. (2014) Study of Surface Morphology, Elemental Composition and Origin of Atmospheric Aerosols (PM2.5 and PM10) over Agra, India. Aerosol and Air Quality Research, 14, 1685-1700.
53. Prakash, A., Kumar, U., Kumar, K., Jain, V.K. (2011) A waveletbased neural network model to predict ambient air pollutants’ concentration. Environmental Modeling and Assessment, 16(5), 503-517.
54. Ravindra, K., Rattan, P., Mor, S., Aggarwal, A.N. (2019) Generalized additive models: Building evidence of air pollution, climate change and human health. Environment International, 132, 104987.
55. Roy, D., Singh, G., Seo, Y.C. (2019) Carcinogenic and non-carcinogenic risks from PM10 and PM2.5-bound metals in a critically polluted coal mining area. Atmospheric Pollution Research, 10(6), 1964-1975.
56. Shahraiyni, H.T., Sodoudi, S. (2016) Statistical Modeling Approaches for PM10 Prediction in Urban Areas; A Review of 21st-Century Studies. Atmosphere, 7, 15.
57. Sharma, M., Maloo, S. (2005) Assessment of ambient air PM10 and PM2.5 and characterization of PM10 in the city of Kanpur, India. Atmospheric Environment, 39(33), 6015-6026.
58. Sharma, S., Nayak, H., Lal, P. (2015) Post-Diwali morbidity survey in a resettlement colony of Delhi. Indian Journal of Burns, 23(1), 76-80.
59. Shubhankar, B., Ambade, B. (2016) Chemical characterization of carbonaceous carbon from industrial and semi urban site of eastern India. Springer Plus, 5, 837.
60. Singh, D.P., Gadi, R., Mandal, T.K. (2011) Characterization of particulate-bound polycyclic aromatic hydrocarbons and trace metals composition of urban air in Delhi, India. Atmospheric Environment, 45, 7653-7663.
61. Singh, K.P., Gupta, S., Kumar, A., Shukla, S.P. (2012) Linear and nonlinear modeling approaches for urban air quality prediction. Science of the Total Environment, 426, 244-255.
62. Singh, K.P., Gupta, S., Rai, P. (2013) Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 80, 426-437.
63. Slini, T., Kaprara, A., Karatzas, K., Moussiopoulos, N. (2006) PM10 forecasting for Thessaloniki, Greece. Environmental Modelling and Software, 21(4), 559-565.
64. Sudheer, A.K., Aslam, M.Y., Upadhyay, M., Rengarajan, R., Bhushan, R., Rathore, J.S., Singh, S.K., Kumar, S. (2016) Carbonaceous aerosol over semi-arid region of western India: Heterogeneity in sources and characteristics. Atmospheric Research, 178-179, 268-278.
65. Tikhe Shruti, S., Khare, K.C., Londhe, S.N. (2013) Forecasting criteria air pollutants using data driven approaches; An Indian case study. Journal Of Environmental Science, Toxicology And Food Technology (IOSR-JESTFT), 3(5), 1-8.
66. Tiwari, S., Bisht, D.S., Srivastava, A.K., Pipal, A.S., Taneja, A., Srivastava, M.K., Attri, S.D. (2014) Variability in atmospheric particulates and meteorological effects on their mass concentrations over Delhi, India. Atmospheric Research, 145-146, 45-56.
67. Tiwari, S., Dumka, U.C., Gautam, A.S., Kaskaoutis, D.G., Srivastava, A.K., Bisht, D.S., Chakrabarty, R.K., Sumlin, B.J., Solm, F. (2017) Assessment of PM2.5 and PM10 over Guwahati in Brahmaputra River Valley: Temporal evolution, source apportionment and meteorological dependence. Atmospheric Pollution Research, 8, 13-28.
68. Ul-Saufie, A.Z., Yahaya, A.S., Ramli, N.A., Rosaida, N., Hamid, H.A. (2013) Future daily PM10 concentrations forecasting by combining regression models and feedforward backpropagation models with principal component analysis (PCA). Atmospheric Environment, 77, 621-630.
69. Vemuri, V. (1988) Artificial neural networks: theoretical concepts; IEEE Computer Society Press Washington DC, United States, pp. 145; ISBN: 978-0-8186-0855-1.
70. Vlachogianni, A., Karppinen, A., Kassomenos, P., Karakitsios, S., Kukkonen, J. (2011) Evaluation of a multiple regression model for the forecasting of the concentrations of NOx and PM10 in Athens and Helsinki. Science of the Total Environment, 409(8), 1559-1571.
71. Wang, W. (2016) Progress in the impact of polluted meteorological conditions on the incidence of asthma. Journal of Thoracic Disease, 8(1), E57-E61.
72. WHO (2018) Concentration occurrence or they should stay away from the high-risk areas. WHO, Geneva. Available online: http://www.who.int/phe/health_topics/outdoorair/? (accessed on 10 March 2020).
73. Willmott, C.J., Matsuura, K., Robeson, S.M. (2009) Ambiguities inherent in sums-of-squares-based error statistics. Atmospheric Environment, 43(3), 749-752.
74. Yadav, M., Soni, K., Soni, B.K., Singh, N.K., Bamniya, B.R. (2019) Source apportionment of particulate matter, gaseous pollutants, and volatile organic compounds in a future smart city of India. Urban Climate, 28, 100470.
75. Yadav, S., Satsangi, P.G. (2013) Characterization of particulate matter and its related metal toxicity in an urban location in southwest India. Environmental Monitoring and Assessment, 185, 7365-7379.
76. Yadav, V., Nath, S. (2019) Novel hybrid model for daily prediction of PM10 using principal component analysis and artificial neural network. International Journal of Environmental Science and Technology, 16(6), 2839-2848.
77. Yoo, K., Yoo, H., Lee, J.M., Shukla, S.K., Park, J. (2018) Classification and regression tree approach for prediction of potential hazards of urban airborne bacteria during Asian dust events. Scientific Reports, 8(11823).

Table S1. 
The number of times violation of WHO and NAAQS standards by Indian cities for PM10 concentration.
Serial
no.
Cities PM10
(in μg m-3)
Violation
WHO NAAQS
1 Adityapur 165 8.25 2.75
2 Agra 295 8.75 2.92
3 Ahmedabad 108.3 5.42 1.81
4 Amritsar 252.22 5.04 2.52
5 Bangaluru 349.8 17.49 5.83
6 Bathinda 204 4.08 2.04
7 Chandigarh 151 3.03 1.51
8 Delhi 182 3.64 1.82
9 Dhanbad 216 10.8 3.60
10 Fatehgarh 197 3.94 1.97
11 Guwahati 90.7 4.54 1.51
12 Hyderabad 174.4 8.72 2.91
13 Jharia 333.7 16.69 5.56
14 Jodhpur 180 3.6 1.8
15 Kanpur 277 2.28 2.77
16 Kolkata 445 22.25 7.42
17 Lucknow 123 2.46 1.23
18 Mangalore 101.8 5.09 1.07
19 Mumbai 54.4 2.72 0.91
20 Pune 113 3.38 1.14
21 Raipur 387.29 19.56 6.45
22 Rohtak 186.09 3.72 1.86
23 Rourkella 127.26 6.38 2.13
24 Shimla 93.9 4.7 1.57
25 Sirsa 203 4.06 2.03
26 Sonitpur 213.67 4.27 2.14
27 Udaypur 128.34 2.57 2.14
28 Varanasi 176.1 8.81 2.94

Table S2. 
Different data driven predictive techniques used for PM forecasting in Indian context.
Author (year) Location (Type) Method Predictor variables Target Remarks
Prakash et al. (2011) Delhi
(Tier I city)
Wavelet and RNN (Recurrent Neural Network) combination CO, NO2, NO, O3, SO2 & PM2.5 CO, O3, NO2, NO, SO2 & PM2.5 Forecast performance was reasonably good.
Singh et al. (2012) Lucknow
(Tier II city)
Partial least squares regression (PLSR), multivariate polynomial regression (MPR) and ANN T, RH, WS, SPM, NO2, SO2 RSPM, SO2, & NO MPR and ANNs performed better.
Singh et al. (2013) Lucknow
(Tier II city)
Single Decision Tree (SDT), Decision Tree Forest (DTF) and Decision Tree Boost (DTB) vs. Support Vector Machine (SVM) Air quality & meteorological parameter AQI and Combined AQI DTF and DTB outperformed the SVM.
Kottur and Mantha (2015) Mumbai
(Tier I city)
ANN and Kriging combination T, RH, WS, WD, AP, NOx, SOx, RSPM NOx, SOx and RSPM ANN and Kriging performed satisfactorily.
Mishra et al. (2015) Delhi
(Tier I city)
Artificial intelligence-based Neuro-Fuzzy (NF) techniques compared MLR, and ANN CO, O3, NO2, SO2, PM2.5, AP, T, WS, WD, RH, V, DP PM2.5 NF model is better than ANN and MLR models.
Gogikar et al. (2018) Rourkela
(Tier II city)
WMLPNN (wavelet based MLP), WRNN (wavelet-based RNN), multi-layer perceptron feed forward neural network (MLPNN) and (RNN) T, RH, BLH, SP, WD, WS PM2.5, PM10 WMLPNN model performed better.
Yadav and Nath (2019) Varanasi
(Tier II city)
PCA- ANN (MLP) and MLR PM2.5, NO, Benzene and VWS for PCA for ANN. SR, WS & AP for MLR PM10 hybrid PCA-ANN model gives a better prediction.
Masood and Ahmad (2020) Delhi
(Tier I city)
SVM and ANN PM2.5, SO2, CO, NO, NOx, C7H8, NO2, VWS, WS, WD, T, RH, SR PM2.5 ANN exhibited better result.
Agarwala et al. (2020) Delhi
(Tier I city)
ANN Meteorological variables PM10, PM2.5, NO2, and O3 O3 predictionsare better than PM.
Nadeem et al. (2020) Chennai
(Tier I city)
ARMA/ARIMA modelling PM10, SO2 & NO2 PM10, SO2 and NO2 Forecasting efficiency can be improved.
Kaur and Mandal (2020) Delhi
(Tier I city)
ANN of four types FFBP, RNN, Elman and NARX (non-linear autoregressive network with exogenous input) PM2.5, T, WS, WD, RH, SR PM2.5 NARX model outperforms others.
Bhardwaj and Pruthi (2020) Delhi.
(Tier I city)
ANFIS (Adaptive-Neuro Fuzzy Inference System), WANFIS (wavelet ANFIS), WANFIS-GA (WANFIS genetic algorithm), WANFIS-PSO (WANFIS particle swarm optimization) PM2.5 PM2.5 WANFIS-PSO performed better.
This study Guwahati
(Tier II city)
MLR- ANN (MLP) and CART PM10, RF, T, RH, WS, NO2, SO2, CO PM10 ANN performed better.
Abbreviations: Temperature (T), Relative humidity (RH), Wind speed (WS), Wind direction (WD), Atmospheric pressure (AP), Nitrogen dioxide (NO2), Sulfur dioxide (SO2), Respirable suspended particulates matter (RSPM), Nitrogen oxides (NOx) and Sulfur oxides (SOx), Ozone(O3), Visibility (V), Dew point (DP), Boundary layer height (BLH), Surface pressure (SP), Solar radiation (SR)

Table S3. 
Monitoring stations and their UTM coordinates.
Code Location Monitoring type Latitude Longitude Area type
STN1_603 Boragaon, IASST Campus Non PM2.5 26.11635 91.68338 Residential
STN2_596 Khanapara, Central Diary Non PM2.5 26.0831 91.8171 Residential
STN3_519 Gopinath Nagar, ITI Building Non PM2.5 26.160962 91.752542 Residential
STN4_541 Santipur, Prajyotish College Non PM2.5 26.165391 91.7276 Residential
STN5_602 Guwahati University campus Non PM2.5 26.15793 91.66312 Residential
STN6_193 Bamunimaidan, PCBA HQ PM2.5 26.185165 91.788334 Residential

Table S4. 
Correlation coefficient for PM10 and gaseous variables and relative humidity (RH).
PM10 RH NO2 SO2 CO
PM10 1
RH -.355** 1
NO2 .231** -0.045 1
SO2 .128** -.121** .477** 1
CO .061* -0.024 -0.011 0.027 1
**p<0.01; *p<0.05

Table S5. 
Performance indicators for ANN MLP validation model, station 6 (Jan-Mar, 2019).
Performance indicators Values
NAE (Net Absolute Error) 0.07
MAE (Mean Absolute Error) 13.02
MSE (Mean Squared Error) 287.95
RMSE (Root Mean Square Error) 16.97
IA (Index of Agreement) 0.89
R2 (Coefficient of Determination) 0.69


Fig. S1. 
Observations and prediction of the MLR models in scatter plots.


Fig. S2. 
Observations and prediction of the ANN models in scatter plots.


Fig. S3. 
Decision tree structures from CART for station 1 to 6.


Fig. S4. 
PM10 Prediction using ANN MLP for station 6 ( Jan-Mar, 2019), for Guwahati city.