Asian Journal of atmospheric environment

Online First

Browse Archives

About

Editors

For Authors

Aims and Scope

Type of Manuscripts

Best Practices

Contact Information

Editor-in-Chief

Associate Editors

Editorial Advisory Board

Code of Publication Ethics

Code of Research Ethics

Contact Information

Username(ID) Password Login

Forgot
my username Forgot
my password Register

Sorry.

You are not permitted to access the full text of articles.

If you have any questions about permissions,

please contact the Society.

죄송합니다.

회원님은 논문 이용 권한이 없습니다.

권한 관련 문의는 학회로 부탁 드립니다.

Journal Archive

Asian Journal of Atmospheric Environment - Vol. 15 , No. 1

[Paper List] [Go to Volume List]


[ Research Article ]
Asian Journal of Atmospheric Environment - Vol. 15, No. 1
Abbreviation: Asian J. Atmos. Environ
ISSN: 1976-6912 (Print) 2287-1160 (Online)
Print publication date 31 Mar 2021
Received 13 Dec 2020 Revised 02 Mar 2021 Accepted 08 Mar 2021
DOI: https://doi.org/10.5572/ajae.2020.131
Air Pollution in Indian Cities and Comparison of MLR, ANN and CART Models for Predicting PM₁₀ Concentrations in Guwahati, India
Abhishek Dutta^* ; Wanida Jinsart
Department of Environmental Science, Faculty of Science, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok 10330, Thailand


Correspondence to : * Tel: +66880441556 E-mail: duttabob@gmail.com
Copyright © 2021 by Asian Association for Atmospheric Environment This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Funding Information ▼ Chulalongkorn University

Abstract

Indian cities are increasingly becoming susceptible to PM₁₀ induced health hazards, thereby creating concern for the country’s policymakers. Air pollution is engulfing the comparatively smaller cities as the rapid pace of urbanization, and economic development seem not to lose steam. A review of air pollution of 28 cities of India, which includes tier-I, II, and III cities of India, found to have grossly violated both WHO (World Health Organisation) and NAAQS (National Ambient Air Quality Standard of India) in respect of acceptable daily average PM₁₀ (particulate matter less than 10 μm in aerodynamic diameter) concentrations by a wide margin. Predicting the city level PM₁₀ concentrations in advance and accordingly initiate prior actions is an acceptable solution to save the city dwellers from PM₁₀ induced health hazards. Predictive ability of three models, linear Multiple Linear Regression (MLR), nonlinear Multi-Layer Perceptron class of Artificial Neural Network (MLP ANN), and nonlinear Classification and Regression Tree (CART), for one day ahead PM₁₀ concentration forecasting of tier-II Guwahati city, were tested with 2016-2018 daily average observed climate data, PM₁₀, and gaseous pollutants. The results show that the non-linear algorithm MLP with feedforward backpropagation network topologies of ANN class, gives the best prediction value compared with linear MLR and nonlinear CART model. Therefore, ANN (MLP) approach may be useful to effectively derive a predictive understanding of one day ahead PM₁₀ concentration level and thus provide a tool to the policymakers for initiating in situ measures to curb air pollution and improve public health.


Keywords: Air pollution, Prediction, Artificial neural network, Multi-variate linear regression, Small city

1. INTRODUCTION

Over the past years, airborne particulate matter (PM) concentrations in Indian cities have been rising and became a matter of concern for the policymakers in India. The effort towards air quality improvement is not easy for a country like India as the country policymakers cannot forego the objective of faster economic development to sustain its vast population. Different sources are continuously pouring pollutants in the city air, and notable amongst them are burning of fuels, industrial establishments, different constructions related to infrastructure, power plants both government and privately operated, stubble burning of agricultural biomass residue in the neighborhoods, and vehicular movements (Guttikunda, 2017). However, their proportional contribution varies across the cities of India. If the whole of India is to be considered about 53,929 automobiles hit India’s road every day (Dutta and Dutta, 2018). All led to the dismal status of air pollution situation across the cities of India. Table 1 and Table S1 present a summary of studies conducted by different researchers in the context of 28 Indian cities with reported level of PM₁₀ concentrations and times they violated the air quality standard of both World Health Organization (WHO) and National Ambient Air Quality Standards (NAAQS) of India, respectively. Kolkata, a tier-1 city^* in India, even clocked PM₁₀ concentration of as high as 445±21 μg m^-3 during the wintertime (Das et al., 2015). Annual PM₁₀ concentrations in New Delhi were reported to be 222±14 μg m^-3 while an earlier study reported summer and winter mean concentrations as 95.1±22.2 μg m^-3 and 182±32.5 μg m^-3, respectively (Tiwari et al., 2014; Singh et al., 2011). Bengaluru city also registered a high annual mean PM₁₀ concentration level of 349.8±205.8 μg m^-3 during the year 2015 (Guttikunda et al., 2019). In comparison, lower concentrations have been reported for Hyderabad and Mumbai where mean PM₁₀ concentrations for the period 2005-2012 were 174.4±86.6 μg m^-3 and 54.4±25.2 μg m^-3, respectively (Dholakia et al., 2014).

Table 1.
Summary of ambient PM₁₀ concentrations from several cities across India (mass concentrations in μg m^-3).

City	City type	Sampling year	Type	PM₁₀ μg m^-3	References
Kolkata	I	Dec, 2013-Jan, 2014	Winter mean	445±210	Das et al., 2015
Pune	I	Jan-Dec, 2016	Mean	62.5	Gawhane, 2019
		June, 2011-May, 2012	Mean	113.8	Yadav and Satsangi, 2013
Hyderabad	I	2005-2012	Mean	174.4±86.6	Dholakia et al., 2014
		June, 2004-May, 2005	Mean	135.1±37.92	Gummeneni et al., 2011
Mumbai	I	2005-2012	Mean	54.4±25.2	Dholakia et al., 2014
Ahmedabad	I	2005-2012	Mean	108.3±69.8	Dholakia et al., 2014
Delhi	I	April to June, 2008	Summer mean	95.1±22.2	Singh et al., 2011
		Nov, 2007-Jan, 2008	Winter mean	182±32.5
		Sept, 2010-Aug, 2012	Mean	222 ±142	Tiwari et al., 2014
Bengaluru	I	2011	Annual mean	221.4±187.5	Guttikunda et al., 2019
		2012	Annual mean	275.6±180.8
		2013	Annual mean	314.3±213.4
		2014	Annual mean	333.6±216.3
		2015	Annual mean	349.8±205.8
		2011-2015	Mean (5 year)	298.94±200.76
		2005-2012	Mean	80.4±21.9	Dholakia et al., 2014
Jodhpur	II	Aug-Sept, 2011	Mean monsoon	180	Sudheer et al., 2016
Varanasi	II	Mar, 2013 to Feb, 2014	Annual mean	176.1±85	Murari et al., 2015
Agra	II	2000-2016	Range	175 to 295	De, 2019
		April, 2010 to Jan, 2011	Mean	230.5	Pipal, 2014
		April, 2010 to Jan, 2011	Mean	242
Guwahati	II	July, 2013-30 June, 2014	Annual mean	90.7±59.7	Tiwari et al., 2017
Raipur	II	Oct, 2008 to Sept, 2009	Annual mean	387.29±76.85	Deshmukh et al., 2013
Mangalore	II	Jan, 2013-Oct, 2016	Mean	101.8	Kalaiarasan et al., 2018
Simla	II	2005-2012	Mean	93.9±58.7	Dholakia et al., 2014
Amritsar	II	9 Nov-15 Nov, 2016	Winter mean	252.22±108.14	Ravindra et al., 2019
Rourkella	II	Jan, 2011-Dec, 2011	Mean (Four seasons)	127.755	Kavuri, 2013
Dhanbad	II	Mar, 2014-Feb, 2015	Mean (Summer, post monsoon & Winter)	216±82	Jena and Singh, 2017
Lucknow	II	Mar-June, 2012	Summer mean	123±13	Lawrence, and Fatima, 2014
Kanpur	II	Oct, 2002-Feb, 2003	Mean	80	Sharma and Mallo, 2005
		Oct, 2002-Feb, 2003	Mean	277±117.61
Chandigarh	II	27 Oct-3 Nov, 2016	Winter mean	151.45±106.40	Ravindra et al., 2019
Fatehgarh Sahib	III	3 Nov-9 Nov, 2016	Winter mean	197.07±61.35	Ravindra et al., 2019
Bathinda	III	16 Nov-21 Nov, 2016	Winter mean	204.04±70.80	Ravindra et al., 2019
Sirsa	III	21 Nov-26 Nov, 2016	Winter mean	203.12±83.28	Ravindra et al., 2019
Rohtak	III	26 Nov-3 Dec, 2016	Winter mean	186.09±78.33	Ravindra et al., 2019
Sonipat	III	3 Dec-6 Dec, 2016	Winter mean	213.67±151.49	Ravindra et al., 2019
Jharia	III	Mar, 2011-Feb, 2012	Mean	333.7±17.86	Roy et al., 2019
Udaypur	III	July, 2017-June, 2018	Mean	128.34	Yadav et al., 2019
Adityapur	III	1 July, 2013-30 June, 2014	Mean	165±43.93	Shubhankar and Ambade, 2016

The tier-II cities are also not lagging far behind the India’s tier-I cities in terms of PM₁₀ pollution. Raipur had mean PM₁₀ concentrations of 387.29±76.9 μg m^-3 during October 2008 to September 2009 while another city Kanpur recorded mean PM₁₀ concentrations of 277± 117.6 μg m^-3 during October 2002 to February 2003 (Deshmukh et al., 2011; Sharma and Maloo, 2005). Amongst the tier-III cities, the reported mean PM₁₀ concentrations of some specific cities like Jharia and Sonipat were also on the higher side with 333.7±17.9 μg m^-3 and 213.7±151.5 μg m^-3 during the period March 2011 to February 2012 and 03 December to 06 December 2016, respectively (Ravindra et al., 2019; Roy et al., 2019).

One option to the Indian policymakers to mitigate critical PM concentrations in the cities, vis a vis health effects, therefore, may be to correctly predict the concentrations at least one to two days in advance and accordingly initiate prior actions such as regulation of traffic in a planned way. However, predicting the air quality is not so straightforward job because of the complex interactions of different nonlinear parameters (Hooyberghs et al., 2005). Shahraiyni and Sodoudi (2016) reviewed 36 research studies executed in different cities of the world in the quest of achieving prediction accuracy in forecasting PM₁₀. In these studies, 50% of researchers employed a multi-layer perceptron (MLP) with Feedforward Backpropagation Network (FFBN) topologies, a class of Artificial Neural Network (ANN) model. Around 28% (10 studies) depended on the widely used Multiple Linear Regression (MLR) technique for PM₁₀ forecasting in urban areas. Three studies (about 8%) used the Radial Basis Function (RBF) network of ANN class to forecast city-level PM₁₀. The other five studies (14%) depended on different other techniques like PNN (Pruned Neural Networks), LL (Lazy Learning), MLP and MLR combo, Elman class of Recurrent Neural Networks (RNN), and PCRA (Principal Component Regression Analysis). ANN technique appears to be providing useful results to deal with nonlinear independent variables involved in environmental pollution prediction. Hence, more practitioners resort to ANN modeling type of data-driven approaches as alternatives to traditional deterministic or nonlinear models (Cabaneros et al., 2019; Jiang et al., 2017). Pollution researchers of China and elsewhere have used ANN techniques extensively to forecast airborne PM concentrations in the past. The use of MLR with stepwise inclusion of input variables has been the most used tool for temporal prediction of PM_2.5 and PM₁₀ in different urban areas of India. MLR has its limitation in terms of the linear representation of nonlinear systems. However, researchers have, in a limited way only, showed a preference for different data-driven predictive techniques for PM forecasting in the Indian context and comparatively judge their performances (Table S2).

Against the above background, this paper’s primary objective is to assess the predictive ability of three contemporary statistical techniques namely MLR, ANN, and CART (Classification and Regression Tree) analyses for one day ahead PM₁₀ concentration prediction of an Indian city. The best-performed technique will be a useful tool for city authorities and air quality managers for initiating in situ measures to curb pollution. Unlike previous modeling efforts (Table S2), this is the first instance concerning applying CART analysis as a statistical procedure for the prediction of PM₁₀ in a comparative set up of an Indian city. In the recent past, Gocheva-Ilieva and Stoimenova (2018) employed CART in predicting PM₁₀ for the Pleven city of Bulgaria and claimed very accurate model performance. The CART technique as a method for analysis and forecasting of PM₁₀ claimed to have performed better than MLR (Slini et al., 2006).

2. LOCATION OF THE STUDY

The model development for forecasting PM₁₀ was attempted in the north-eastern Indian tier-II city of Guwahati, capital city of the state of Assam, India. For the last 10-12 years, Guwahati has been recognized as one of India’s most rapidly growing cities. Rapid urbanization and its contribution to air pollution have made smaller Indian cities like Guwahati vulnerable too. Vehicular growth (both light and heavy vehicles) in the city was notable in the past decade, with about a reported sharp rise of 87%. A recent study conducted in Guwahati, computed Hazard Quotient (HQ) based on NAAQS and WHO, indicated quite a high degree of health risk for the city dwellers (Dutta and Jinsart, 2020). There is black carbon pollution in the city air due to rapid urbanization and poor environmental quality control (Barman and Gokhale, 2019). Guwahati has a humid subtropical climate. The four major seasons of the city are winter (December to February), spring (March to May), summer ( June to August), and autumn (September to November), with differing meteorological conditions. Guwahati has six ambient air monitoring stations, set up under the National Air Quality Monitoring Programme (NAMP), to measure key pollutants (Pant et al., 2019). Only one of the NAMP stations can measure PM_2.5, while the newly developed CAAQM (Continuous Ambient Air Quality Monitoring) station started functioning only during mid of 2019. The six NAMP stations’ location and their monitoring type in the backdrop of Guwahati city can be seen in Fig. 1 and Table S3, respectively below.

Fig. 1.
Study location and monitoring stations.

3. METHODS

Daily average concentration data (1096 data points) for PM₁₀ (μg m^-3), CO (ppm), NO₂ (ppb), and SO₂ (ppb) were collected in respect of all the six air quality monitoring stations for three years 2016-2018 from State Pollution Control Board (SPCB) office located at Guwahati. The three years (2016-2018) daily climate data (1096 data points) for ambient temperature (AT, °C), relative humidity (RH, %), wind speed (WS, ms^-1), rainfall (RF, millimeter) were acquired from Regional Meteorological Department, located at Guwahati.

3. 1 Data Treatment

A few missing values were observed in respect of daily average concentration data for PM₁₀, CO, NO₂, and SO₂ for the 2016-2018 time-series data. As the observed values vary significantly, those few days were removed from the data set instead of the linear interpolation technique. The modified data set contained 1092 observations. Climate data (1096 data points) had no missing value but adjusted to have parity with pollutant data by removing the corresponding values.

3. 2 Descriptive Statistics and Analysis of Time Series

Descriptive statistcs of the climate data, PM₁₀, and gaseous pollutants for the period 2016-2018 (1092 data points) and time series analysis were also worked out in respect of air quality monitoring station 6 to understand the characteristics and correlation of different variables throughout the study. Station 6 was found to be a representative one out of six air quality monitoring stations of the city due to reasons like the completeness of data sets and common refection of land-use patterns of the city. Multiple time series charts were produced with time on the horizontal axis and PM₁₀ concentrations, climate variables, and gaseous variables (AT, RH, RF, WS, SO₂, CO, NO₂) on the vertical coordinate axes.

3. 3 Predictive Models Development and Validation

We have used MLR analysis, MLP class of ANN, and CART for forecasting of one day ahead PM₁₀ concentration for all the six air quality monitoring stations of Guwahati city.

3. 3. 1 Multiple Linear Regression (MLR)

In MLR analysis, the mathematical model was built up to forecast the dependent variable, i.e., next day PM₁₀ based on the inputs of independent variables comprising of climate variables and gaseous elements. In MLR, the coefficient of determination (R²) indicates the overall capability of the model to handle variance in data. The regression model was composed following equation 1 (Abdullah et al., 2019; Vlachogianni et al., 2011).

Yi=β0+β1X1i+β2X2i+⋯+βnXni+εi

(1)

where Y is the dependent variable, β_i is the regression coefficients, X_i is the independent variables and ε is a stochastic error associated with the regression. This relationship was used in this study to develop a mathematical equation model to predict the next day PM₁₀ concentrations of the six ambient air monitoring stations of Guwahati with input variables like meteorological parameters, PM₁₀, and gaseous pollutants. MLR assumes that the residuals have a normal distribution with a zero mean, uncorrelated and constant variance. The stepwise multiple linear regression procedure was used here to derive the mathematical equation (Abdullah et al., 2019). Variance inflation (VIF) was used in this study to evaluate the multicollinearity effect on the variance of the estimated regression coefficient. The equation for VIF (Equation 2) is as follows:

VIF=11-R2

(2)

3. 3. 2 Multi-Layer Perceptron (MLP) Model

ANN is a robust data modeling technique capable of handling the nonlinear relationship between variables and hence found suitable for the prediction of PM₁₀ which requires exploration of the complex relationship between particulate matters, meteorological variables, and gaseous pollutants present in the atmosphere (Feng et al., 2015). We have used MLP in this study to create predictive models for each of six ambient monitoring stations of Guwahati using nonlinear combinations of the input variables (meteorological parameters, PM₁₀, PM_2.5, and gaseous pollutants) to predict the next day PM₁₀ concentrations. MLP forms a network of functionally interconnected neurons, also known as perceptron (Vemuri, 1988). ANN scores more than MLR because of its ability to predict the dependent variable of a builtup model more accurately (Gardner and Dorling, 1998). MLP has a simple structure consisting of three layers: the input layer, hidden layer, and output layer. One hidden layer was considered in our study, as it was suggested to be sufficient to achieve the optimum model capacity (Bishop, 1995). The number of neurons or the nodes, in the input layer, was equal to the number of input variables introduced in the model. The relevant input variables, i.e., observed meteorological parameters, PM₁₀, and gaseous pollutants, are fed in the model as signals to the input layer of the model, which is then passed on to the hidden layer. The neurons do the computations to detect features of the input variables and introduce them to the input layer with requisite weights. The weights are assigned to input variables based on their relative importance. The hidden layer does the critical function of nonlinear transformations of the inputs entered the network through a predefined activation function. The neuron sums up information, including bias, in the hidden layer. The bias does the job of providing a trainable constant value to every neuron in addition to its normal value. The mathematical formulation of the MLP model is as shown below in equation 3:

Y=F∑j=1mWkj∙F∑i=1nWjiXi+Bj+Bk

(3)

where Y=output, F=transfer function, W_kj.=weights between hidden and output layers, W_ji=weights between input and hidden layers, X_i=input variables, m=number of neurons in a hidden layer, n=number of neurons in an input layer, B_j=bias values of the neurons in the hidden, and B_k=bias values of the neurons in the output layers. Fig. 2 depicts the basic structure of the MLP framework.

Fig. 2.
The architecture of the MLP network.

3. 3. 3 Classification and Regression Trees (CART)

CART is a non-parametric regression technique that can be employed for the prediction of an independent variable when the distribution of independents variables is not known. Typically, therefore, the CART method tries to ascertain the distribution pattern of the outcome (dependent) variable using the independent variables through their linear or nonlinear relationship with the outcome variable. CART builds up a decision tree through a hierarchy of binary decisions. Each binary decision will involve splitting a target variable into two alternative and mutually exclusive branches (groups) depending upon the variation/values of the explanatory variable leading to the most considerable possible reduction in post-split variations/values of the target variable. In other words, splitting stops when there is no additional gain by further splitting can be achieved (Mckenney and Pedlar, 2003; Moisen and Frescino, 2002). Predictive CART models have been built up in this study for each of the ambient air quality monitoring stations with observed independent predictor variables like meteorological parameters, PM₁₀, and gaseous pollutants of the respective stations to predict the respective dependent variables i.e., next day PM₁₀ concentrations of the city.

3. 3. 4 Model Validation

MLR, MLP, and CART equations have been validated by computing net absolute error (NAE), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), index of agreement (IA), coefficient of determination (R²) (Grzesiak and Zaborski, 2012; Jinsart et al., 2010; Willmott et al., 2009). Table 2 provides the performance indicators for model validation.

Table 2.
Performance indicators for model validation.

Sl. No.	Performance indicators	Equations
1	Net absolute error	NAE=∑i=1nPi-Oi∑i=1nOi
2	Mean absolute error	MAE=1n∑i-1nPi-Oi
3	Mean square error	MSE=∑i=0nPi-Oi2n
4	Root mean square error	RMSE=∑i=1nPi-Oi2n
5	Index of agreement	IA=1-∑i=1nPi-Oi2∑i=1nPi-O¯+Oi-O¯2
6	Coefficient of determination	R2=1-∑i=1nOi-Pi2∑i=1nOi-O¯2

SPSS 25 has been used for computation of MLR and MLP while computation for CART SPSS modeler 18 has been used in this study.

4. RESULTS AND DISCUSSION

4. 1 Descriptive Statistics and PM₁₀ Concentration of Guwahati

4. 1. 1 Descriptive Statistics

The mean values and standard deviations of the meteorological parameters, PM₁₀, and gaseous pollutants of the respective air quality monitoring stations of the city under consideration are provided in Table 3. High variability was observed in the PM₁₀ level. During 2016- 2018 the daily average PM₁₀ concentration varied. Across the six air quality monitoring stations, the maximum and minimum mean PM₁₀ concentration was 133.32 μg m^-3 and 51.41 μg m^-3, respectively. The highest daily average PM₁₀ recorded was 259.39 μg m^-3, while the lowest was 40.67 μg m^-3 during the period 2016- 2018. The average RH level of the city was found to be on the higher side while wind speed on the lower side. The time-series data reveals the maximum temperature of 34°C recorded during the summer season while the minimum was 14°C during the winter season. Guwahati received rainfall due to the southwest monsoon and the highest rainfall occurred from June to August.

Table 3.
Descriptive statistics.

Parameters	Monitoring Stations: Mean and standard deviation
Parameters	S1_603 Mean	SD	S2_596 Mean	SD	S3_519 Mean	SD	S4_541 Mean	SD	S5_602 Mean	SD	S6_193 Mean	SD
RF	4.11	11.67	4.11	11.67	4.11	11.67	4.11	11.67	4.11	11.67	4.11	11.67
Temp	25.33	4.40	25.33	4.40	25.33	4.40	25.33	4.40	25.33	4.40	25.33	4.40
Dew PT	21.27	4.94	21.27	4.94	21.26	4.94	21.26	4.94	21.27	4.94	21.26	4.94
RH	78.91	9.74	78.91	9.74	78.91	9.74	78.91	9.74	78.91	9.74	78.91	9.74
WS	1.33	0.90	1.33	0.90	1.33	0.90	1.33	0.90	1.33	0.90	1.33	0.90
PM₁₀	104.57	57.28	108.62	57.10	133.22	66.99	105.97	51.41	108.62	57.10	109.82	51.41
NO₂	17.46	4.96	21.66	5.45	17.01	4.06	14.19	5.08	21.66	5.45	16.75	5.08
SO₂	7.62	2.42	7.55	2.65	7.52	2.27	5.63	2.27	7.55	2.65	7.61	2.27
CO	1.64	0.72	3.25	2.38	2.85	0.71	2.88	0.71	3.25	2.38	3.43	0.72

4. 1. 2 Correlation of PM₁₀ Concentration, Climate Variables, and Gaseous Variables

In Fig. 3(A)-3(G), the time series of the observed meteorological parameters, PM₁₀, and gaseous pollutants are reported in respect of air quality monitoring station 6 of the Guwahati city. It can be observed from Fig. 3(A) that the site is characterized by relatively high humidity throughout the year. The time series considered in this study shows that the concentration of PM₁₀ has maintained almost a negative correlation with relative humidity. PM₁₀ concentration behavior of the city shows a pattern of annual cycle with high concentrations during winter (December to February), possibly due to lower planetary boundary layer height, and a higher level of concentrations seems to continue up to the months of March-April as well, i.e., beyond winter.

Fig. 3.
The time series 2016-18 measured in Guwahati: (A) PM₁₀ vs RH, (B) PM₁₀ vs SO₂, (C) PM₁₀ vs CO, (D) PM₁₀ vs NO₂, (E) PM₁₀ vs Temperature, (F) PM₁₀ vs Rainfall, (G) PM₁₀ vs Wind Speed.

Another peculiarity of the site is that CO, NO₂, and SO₂ have a positive correlation with PM₁₀ concentrations suggesting a common source for these compounds but the correlation with SO₂ is stronger as shown in Fig. 3(B)-3(D), and Table S4. Fig. 3(E) indicates negative correlation of PM₁₀ with temperature and very mild positive correlation with rainfall and windspeed as shown in Fig. 3(F) and 3(G), respectively.

4. 2 Multiple Linear Regression Model for PM₁₀ Forecasting

The MLR model summary, developed for all six ambient air quality monitoring stations located at Guwahati, has been placed in Table 4. The range of the Variance Inflation Factor (VIF) for the independent variables of all the six MLR models is found in order as they are below 10, showing the non-existence of multicollinearity issues in the models. Durbin Watson (D-W) statistics show that the models can accommodate the autocorrelation, as the values were in the range of 2.103-2.239. The residual (error) is critical in choosing the robustness of the factual model as linear regression is sensitive to outlier effects. Fig. 4(A)-4(F) shows the histogram plot, which indicates that the residuals are also normally distributed with zero mean and constant variance. Fig. S1(A)-S1(F) show the observation and prediction of the MLR models in scatter plots.

Table 4.
Summary of the Multiple Linear Regression (MLR) models for PM₁₀ forecasting.

Monitoring station	Model	R²	Range of VIF	D-W statistics
S1_603	PM_{10, (t+1)} concentration= 204.565+0.551 (PM₁₀)-3.509 (Ambient temperature) -0.870 (Relative humidity)	0.628	1.277-1.895	2.216
S2_596	PM_{10, (t+1)} concentration= 222.67+0.492 (PM₁₀)-3.340 (Ambient temperature) -1.131 (Relative humidity)+1.994 (CO)	0.624	0.249-1.925	2.226
S3_519	PM_{10, (t+1)} concentration= 214.67+0.618 (PM₁₀)–3.382 (Ambient temperature) -0.988 (Relative humidity)	0.674	1.330-1.904	2.180
S4_541	PM_{10, (t+1)} concentration= 254.38+0.417 (PM₁₀)-4.934 (Ambient temperature) -0.782 (Relative humidity)-1.026 (SO₂)	0.622	1.164-2.183	2.103
S5_602	PM_{10, (t+1)} concentration= 161.91+0.564 (PM₁₀)-2.725 (Ambient temperature) -0.652 (Relative humidity)	0.61	1.204-1.754	2.239
S6_193	PM_{10, (t+1)} concentration= 216.23+0.572 (PM₁₀)-3.667 (Ambient temperature) -0.868 (Relative humidity)-0.915 (SO₂)	0.682	1.067-2.082	2.137

Fig. 4.
Histogram plots (A) station 1, (B) station 2, (C) station 3, (D) station 4, (E) station 5 and (F) station 6.

4. 3 Multi-Layer Perceptron Model

The normalized input variables PM₁₀, RF, T, RH, WS, NO₂, SO₂, CO of the respective air monitoring stations were fed into the six different ANN models using the normalizing data conversion facility of the ANN module of SPSS software. For ANN training 70% of the data set and testing 30% of the data set were used. The training data set is propagated in the forward phase, through the hidden layer, which comes out through the output layer. The error, i.e., the difference between output values and actual target output values are propagated back toward the hidden layer until the errors are reduced in successive cycles (Ul-Saufie et al., 2013). In the process, the neural network learnt and changed weights during forward and backward phases. We, in this study, engaged different combinations of transfer functions like sigmoid/ hyperbolic tangent, sigmoid/linear, sigmoid/sigmoid, and hyperbolic tangent/linear functions for each of the six monitoring stations to compare and pick up the optimum R² values as shown in Table 5. The network structure, transfer functions of each of the models, and performance indicators (IA, R², NAE, MAE, MSE, and RMSE) can be seen in Table 5 below. The optimum R² values (0.651 for station S1, 0.637 for station S2, 0.688 for station S3, 0.636 for station S4, 0.641 for station S5 and 0.693 for station S6) are also marked ‘bold’ in Table 5. The respective values of the performance indicators like IA, NAE, MAE, MSE, and RMSE, for each of six monitoring stations, against the optimum R² values can also be seen in Table 5. Fig. S2(A)-S2(F) show the observation and prediction of the ANN models in scatter plots.

Table 5.
Predictive MLP models with network structure, transfer functions and performance indicators.

Target	Stn.	Network structure Input : Neurons : Output	Transfer function: hidden/output layer	R²	NAE	MAE	MSE	RMSE	IA
PM_{10, (t+1)}	S1	08 : 07 : 01	Sigmoid/Hyperbolic tangent	0.626	0.15	16.02	497.86	22.31	0.95
			Sigmoid/Linear	0.651
			Sigmoid/Sigmoid	0.640
			Hyperbolic tangent/Linear	0.646
PM_{10, (t+1)}	S2	08 : 07 : 01	Sigmoid/Hyperbolic tangent	0.637	0.22	23.80	1200.21	34.64	0.88
			Sigmoid/Linear	0.634
			Sigmoid/Sigmoid	0.630
			Hyperbolic tangent/Linear	0.629
PM_{10, (t+1)}	S3	08 : 07 : 01	Sigmoid/Hyperbolic tangent	0.674	0.20	26.11	1408.45	37.53	0.90
			Sigmoid/Linear	0.688
			Sigmoid/Sigmoid	0.679
			Hyperbolic tangent/Linear	0.672
PM_{10, (t+1)}	S4	08 : 07 : 01	Sigmoid/Hyperbolic tangent	0.626	0.21	22.57	962.49	31.02	0.88
			Sigmoid/Linear	0.621
			Sigmoid/Sigmoid	0.627
			Hyperbolic tangent/Linear	0.636
PM_{10, (t+1)}	S5	08 : 07 : 01	Sigmoid/Hyperbolic tangent	0.635	0.22	23.40	1158.79	34.04	0.88
			Sigmoid/Identify	0.630
			Sigmoid/Sigmoid	0.623
			Hyperbolic tangent/Identify	0.641
PM_{10, (t+1)}	S6	08 : 07 : 01	Sigmoid/Hyperbolic tangent	0.693	0.22	21.87	1007.72	31.74	0.90
			Sigmoid/Identify	0.687
			Sigmoid/Sigmoid	0.686
			Hyperbolic tangent/Identify	0.686

4. 4 Predictive CART Model

By using CART analysis, several decision trees were developed based on different combinations of observed meteorological parameters, PM₁₀, and gaseous pollutants for the three years (2016-2018). As typical in machine learning, out of the total data points of the respective independent and dependent variables, 70% used as trained set while 30% as the test set. The optimum models were produced for each of the six air quality monitoring stations of Guwahati when they had the least relative errors in respective cases given by equation 4 below.

Relative error of CART=SKSO

(4)

where S(K) is equal to the sum of the squared residuals at the terminal node and S(O) is the sum of squared errors of the dependent error around its mean in the root node. The predictive CART models and performance indicators (like R², IA, NAE, MAE, MSE, and RMSE) are given in Table 6. Fig. S3(A)-S3(F) show the decision trees of the CART models.

Table 6.
Predictive CART models with input variable PM₁₀, RF, T, RH, WS, NO₂, SO₂, CO and performance indicators.

Target	Stn.	Set	R²	NAE	MAE	MSE	RMSE	IA
PM_{10, (t+1)}	S1	Training	0.660	0.22	23.10	1132.68	33.66	0.89
PM_{10, (t+1)}	S1	Testing	0.614	0.26	26.39	1292	35.94	0.88
PM_{10, (t+1)}	S2	Training	0.519	0.24	26.34	1598.04	39.98	0.82
PM_{10, (t+1)}	S2	Testing	0.568	0.24	25.95	1357.10	36.84	0.85
PM_{10, (t+1)}	S3	Training	0.637	0.21	28.47	1609.12	40.11	0.88
PM_{10, (t+1)}	S3	Test	0.625	0.21	27.35	1700.95	41.24	0.88
PM_{10, (t+1)}	S4	Training	0.606	0.22	23.58	1011.40	31.80	0.87
PM_{10, (t+1)}	S4	Test	0.522	0.26	27.05	1339.12	36.59	0.84
PM_{10, (t+1)}	S5	Training	0.628	0.22	20.59	824.04	28.71	0.88
PM_{10, (t+1)}	S5	Test	0.577	0.24	21.97	922.12	30.37	0.85
PM_{10, (t+1)}	S6	Training	0.656	0.21	23.03	1066.67	32.66	0.89
PM_{10, (t+1)}	S6	Test	0.628	0.21	23.13	1282.33	35.81	0.88

4.5 Model Comparison

All six performance indicators were put to use for comparing the one-day ahead PM₁₀ prediction performances of three methods, i.e., MLR, ANN (MLP), and CART to isolate the best model, as shown in Table 7. NAE, MAE, MSE, and RMSE were used to find the error of the model, where a value closer to 0 indicated a better model. The other two performance indicators, namely, IA and R², were used to check the accuracy of the model result, where higher accuracy is given by a value closer to 1. The values for performance indicators provide specific information regarding predictive performance efficiencies (Singh et al., 2013). RMSE wise comparison between models is best desired when the objective is to avoid large prediction errors.

Table 7.
PM₁₀ prediction model performance statistics: NAE MAE, MSE, RMSE IA, and R² between measured and estimated values for six air quality monitoring stations.

	Station 1			Station 2
	MLR	MLP	CART	MLR	MLP	CART
NAE	0.23	0.15	0.26	0.22	0.22	0.24
MAE	24.37	16.00	26.39	24.02	23.80	25.95
MSE	1219.39	497.86	1292	1226.49	1200.21	1357
RMSE	34.92	22.31	35.94	35.02	34.64	36.84
R²	0.63	0.65	0.61	0.62	0.64	0.57
IA	0.87	0.95	0.88	0.87	0.88	0.85
	Station 3			Station 4
	MLR	MLP	CART	MLR	MLP	CART
NAE	0.20	0.20	0.21	0.22	0.21	0.26
MAE	26.26	26.11	27.35	23.05	22.57	27.05
MSE	1461.48	1408.45	1700.95	998.95	962.49	1339.12
RMSE	38.23	37.53	41.24	31.61	31.02	36.59
R²	0.67	0.69	0.63	0.62	0.64	0.52
IA	0.89	0.90	0.88	0.87	0.88	0.84
	Station 5			Station 6
	MLR	MLP	CART	MLR	MLP	CART
NAE	0.23	0.22	0.24	0.20	0.22	0.21
MAE	21.37	23.40	21.97	22.20	21.84	23.13
MSE	859.33	1158.79	922.12	1023.64	1007.12	1282.33
RMSE	29.31	34.04	30.37	31.99	31.74	35.81
R²	0.61	0.64	0.58	0.68	0.69	0.63
IA	0.87	0.88	0.85	0.90	0.90	0.88

On the other hand, MAE casts light on the average magnitude of the error without considering their direction. The advantage of the linear score of MAE lies in the fact that all individual differences between predictions and corresponding observed values are given equal weight in the average. However, amongst all six performance indicators, R² can be regarded as the single most important measure in deciding the prediction accuracy (Yoo et al., 2018).

In this study, the prediction of one day ahead PM₁₀ for all the six air quality monitoring stations displayed relatively good fits through the use of MLP methods (R²=0.64-0.69; IA=0.88-0.95) and smallest errors (NAE=0.15-0.22; MAE=16-26.11; MSE=497.86- 1408.45; and RMSE=22.31-37.53 in comparison to its closest performer MLR method (R²=0.61-0.68; IA=0.87-0.90; NAE =0.20-0.23; MAE =21.37- 26.26; MSE=859.23-1461.48; and RMSE=29.31- 38.23). It can be seen from Table 7 that CART as predictive method for one day ahead PM₁₀ were close to MLR but not equal in terms of model evaluation indicators with R²=0.52-0.63; IA=0.84-0.88; NAE=0.21- 0.26; MAE=21.97-27.35; MSE=922.12-1700.95; and RMSE=30.37-41.24 in the test set results as clearly revealed in Table 7.

The accuracy measures range of R² (0.64-0.69) and IA (0.88-0.95), in combination, for ANN (MLP) models are providing best correlations between predicted and observed PM₁₀ concentrations during the three years 2016-2018 when compared with the MLR models (R²: 0.61-0.68, IA: 0.87-0.90) and CART model (R²: 0.52-0.63, IA: 0.84-0.88). Again, in terms of prediction error of the models, ANN (MLP) providing the least model error possibilities (NAE: 0.15, MAE: 16, RMSE: 22.31 and MSE: 497.86) than MLR (NAE: 0.21, MAE: 21.97, RMSE: 29.31 and MSE: 859.33) and CART (NAE: 0.21, MAE: 21.97, RMSE: 30.37; MSE: 922.12). Hence, the results obtained from the ANN (MLP) models were more suitable for Guwahati than those of the constructed MLR and CART models.

5. CONCLUSIONS AND RECOMMENDATIONS

The comprehensive and comparative review of PM₁₀ concentration status of 28 different categories of Indian cities (tier-I, tier-II, and tier-III cities) and alarming levels of PM₁₀ concentrations thereof indicate the urgent need to improve city-level air quality. Kolkata, a tier-I city in India, even clocked PM₁₀ concentration of as high as 445±21 μg m^-3 during the wintertime. The tier-III cities like Raipur and Kanpur were found to be not lagging far behind the tier-I cities in terms of ambient PM₁₀ concentration. Interestingly, tier-III cities like Jharia and Sonipat were also recoded PM₁₀ concentration as high as 333.7±17.86 μg m^-3 and 213.67±151.49 μg m^-3 respectively. The PM₁₀ concentrations level in all the 28 indian cities grossly violated both WHO and NAAQS standards by a wide margin. Kolkata topped the list with 22.25 times more than the WHO standard and 7.42 times NAAQS followed by Bengaluru (17.49 times WHO standard and 5.83 times NAAQS), and Delhi (11.1 times WHO standard and 3.7 times NAAQS). Therefore, it is high time for the initiation of some requisite actions for diminishing or preventing the build-up of the high ambient PM₁₀ concentration level in the cities. One way out is abatement action through short term traffic reduction in cities based on predicted PM₁₀ concentration level in advance. Therefore, it entails correct prediction of the city level PM₁₀ concentrations at least one or two days in advance by the local air quality managers through analysis of data routinely gathered by city authorities and predictive modeling thereof.

The tier-II city Guwahati recorded high variability in the observed in PM₁₀ level due to the rapid urbanization. The highest daily average PM₁₀ recorded was 259.39 μg m^-3, while the lowest was 40.67 μg m^-3 during the period 2016-2018. The mean PM₁₀ concentration for the city of 133.22 μg m^-3, as found in this study, violated WHO standard by 6.66 times and NAAQS by 2.22 times which were 4.54 times and 1.51 times respectively, during 2013-2014 (Table 1, Table S1). The average daily NO₂, CO, and SO₂ concentrations of Guwahati were found to be in correlation with PM₁₀ concentrations during 2016-2018 and thereby suggesting a common source for these compounds.

In different cities of the world, different predictive modeling techniques have been used to predict PM₁₀ in advance. However, the use of MLR with stepwise inclusion of input variable was found to be the most widely used tool for temporal prediction of PM₁₀ in different urban areas of India, and that too mostly applied in bigger cities of the country (Table S2). This study found that the next day’s PM₁₀ concentrations, in a tier-II city Guwahati, can be better forecasted using non-linear algorithm MLP with FFBN topologies of ANN class in comparison to linear MLR and non-linear CART model. These three models were critically assessed through a comparative evaluation of performance indicators keeping in mind the end goal is to choose the best-fitted model for accurate forecasting PM₁₀ at the city level. The result of the study reveals that the one day ahead PM₁₀ for all the six-air quality monitoring stations of Guwahati, prediction ability has been relatively better using MLP methods (R²=0. 0.64-0.69; IA=0.88-0.95) and with smallest errors (NAE=0.15-0.22; MAE=16-26.11; MSE=497.86-1408.45; and RMSE=22.31-37.53 in comparison to its closest performer MLR method (R²=0.61-0.68; IA=0.87-0.90; NAE=0.20-0.23; MAE=21.37-26.26; MSE=859.23-1461.48; and RMSE=29.31-38.23). It is interesting to note that CART as predictive method for one day ahead PM₁₀ were close to MLR but not equal in terms of model evaluation indicators with R²=0.52-0.63; IA=0.84-0.88; NAE =0.21-0.26; MAE =21.97-27.35; MSE = 922.12-1700.95; and RMSE=30.37-41.24. The relatively low R² value is quite common in the case of time series dependent nonlinear atmospheric variables with their known confounding effects.

An attempt was made to further validate the predictive performance of the MLP model with respect to the observed PM₁₀ data of the (STN6_193) NAMP monitoring station of Guwahati beyond the period of collected data ( January-March, 2019) used in developing the MLP model. The predicted PM₁₀ concentrations obtained using MLP model have been matched with the same period’s actual data. Fig. S4 shows that the MLP model performed well for the post-study period as well and the model performance indicators (R²=0.69, IA=0.89, NAE=0.07, MAE=13.02, MSE=287.95 and RMSE=16.97) were also in line with the original model (Table S5).

In the backdrop of CPCB’s acknowledgment that comparatively smaller tier-II cities are also facing severe air pollution, city authorities are contemplating initiating several steps for curtailing air pollution and health hazards thereof. We recommend the local authority to use the non-linear algorithm MLP (ANN) with FFBN topologies for forecasting PM₁₀ concentration in the smaller Indian cities like Guwahati too for avoiding PMinduced health hazards to a great extent. ‘Predict pollution and defeat concentration’ could be another approach to fight the air pollution menace in addition to the odd-even rule, which few Indian cities are enforcing presently to rein on air pollution through curtailment of vehicular pollution. The advance prediction approach seems to be more applicable to Guwahati city as this study found PM₁₀ concentration built up had a positive correlation with gaseous pollutants and hence likely to have a common source, i.e., vehicular pollution. Moreover, with this model, the local SPCB authorities can caution city dwellers of impending dangerous levels of PM₁₀, so that they can lessen their outdoor activities for those days and thereby avoiding exposure to unhealthy levels of air quality.

Notes

* Indian government classification of cities based on their population as tier-I, tier-II, and tier-III.

Acknowledgments

This study was supported by the Graduate School Thesis Grant, Chulalongkorn University, Bangkok, Thailand. The authors also thank the State Pollution Control Board, Assam, and Regional Meteorological Department, Guwahati, for air pollution and meteorological information, respectively.

References


1.	Abdullah, S., Ismail, M., Ahmed, A.N., Abdullah, A.M. (2019) Forecasting Particulate Matter Concentration Using Linear and Non-Linear Approaches for Air Quality Decision Support. Atmosphere, 10, 667.
2.	Agarwala, S., Sharma, S., Suresh, R., Rahman, M.H., Vranckx, S., Maiheu, B., Blyth, L., Janssen, S., Gargava, P., Shukla, V.K., Batra, S. (2020) Air quality forecasting using artificial neural networks with real time dynamic error correction in highly polluted regions. Science of the Total Environment, 735, 139-454.
3.	Apte, J.S., Marshall, J.D., Cohen, A.J., Brauer, M. (2015) Addressing global mortality from ambient PM_2.5. Environmental Science and Technology, 49(13), 8057-8066.
4.	Barman, N., Gokhale, S. (2019) Urban black carbon-source apportionment, emissions and long-range transport over the Brahmaputra River Valley. Science of the Total Environment, 693, 133577.
5.	Bhardwaj, R., Pruthi, D. (2020) Evolutionary techniques for optimizing air quality model. Procedia Computer Science, 167, 1872-1879.
6.	Bishop, C.M. (1995) Neural Networks for Pattern Recognition, O_xford Univ. Press: O_xford, NY, USA, 1995; ISBN 978-0-19-853864-6.
7.	Cabaneros, S.M., Calautit, J.K., Hughes, B.R. (2019) A review of artificial neural network models for ambient air pollution prediction. Environmental Modelling and Software, 119, 285-304.
8.	Carnevale, C., Pisoni, E., Volta, M. (2010) A non-linear analysis to detect the origin of PM₁₀ concentrations in Northern Italy. Science of the Total Environment, 409(1), 182-191.
9.	Chelani, A.B., Gajghate, D.G., Hasan, M.Z. (2002) Prediction of ambient PM₁₀ and toxic metals using artificial neural networks. Journal of the Air and Waste Management Association, 52(7), 805-810.
*10.*	Chen, K., Glonek, G., Hansen, A., Williams, S., Tuke, J., Salter, A., Bi, P. (2016) The effects of air pollution on asthma hospital admissions in Adelaide, South Australia, 2003-2013: time-series and case-crossover analyses. Clinical and Experimental Allergy, 46(11), 1416-1430.
*11.*	CPCB (2016) Central Pollution Control Board, Delhi. July, 2016. Available online: https://www.cpcb.nic.in/openpdffile.php?id=TGF0ZXN0RmlsZS9MYXRlc3RfMTIzX1NVTU1BUllfQk9PS19GUy5wZGY=[AQ5] (accessed on 8 January 2020).
*12.*	Czernecki, B., Półrolniczak, M., Kolendowicz, L., Maros, M., Kendzierski, S., Pilguj, N. (2017) Influence of the atmospheric conditions on PM₁₀ concentrations in Poznań, Poland. Journal of Atmospheric Chemistry, 74(1), 115-139.
*13.*	Das, R., Khezri, B., Srivastava, B., Datta, S., Sikdar, P.K., Webster, R.D. (2015) Trace Element Composition of PM_2.5 and PM₁₀ from Kolkata - A Heavily Polluted Indian Metropolis. Atmospheric Pollution Research, 6(5), 742-747.
*14.*	De, S. (2019) Long-term ambient air pollution exposure and respiratory impedance in children: A cross-sectional study. Respiratory Medicine, 170, 105795.
*15.*	Deshmukh, D.K., Deb, M.K., Tsai, Y.I., Mkoma, S.L. (2011) Water Soluble Ions in PM_2.5 and PM1 Aerosols in Durg City, Chhattisgarh, India. Aerosol and Air Quality Research, 11, 696-708.
*16.*	Deshmukh, D.K., Deb, M.K., Mkoma, S.L. (2013) Size distribution and seasonal variation of size-segregated particulate matter in the ambient air of Raipur city, India. Air Quality Atmosphere and Health, 6, 259-276.
*17.*	Dholakia, H.H., Bhadra, D., Garg, A. (2014) Short term association between ambient air pollution and mortality and modification by temperature in five Indian cities. Atmospheric Environment, 99, 168-174.
*18.*	Dutta, A., Dutta, G. (2018) Indian Growth Story of Automobile Sector and Atmospheric Emission Projection. Pollution Research, 37(1), 131-143.
*19.*	Dutta, A., Jinsart, W. (2020) Risks to health from ambient particulate matter (PM_2.5) to the residents of Guwahati city, India: An analysis of prediction model. Human and Ecological Risk Assessment: An International Journal.
*20.*	Feng, X., Li, Q., Zhu, Y., Hou, J., Jin, L., Wang, J. (2015) Artificial neural networks forecasting of PM_2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmospheric Environment, 107, 118-128.
*21.*	Ferreira, T.M., Forti, M.C., de Freitas, C.U., Nascimento, F.P., Junger, W.L., Gouveia, N. (2016) Effects of particulate matter and its chemical constituents on elderly hospital admissions due to circulatory and respiratory diseases. International Journal of Environmental Research and Public Health, 13(10), 947.
*22.*	Gardner, M.W., Dorling, S.R. (1998) Artificial neural networks (the multilayer perceptron)- a review of applications in the atmospheric sciences. Atmospheric Environment, 32(14- 15), 2627-2636.
*23.*	Gawhane, R.D., Rao, P.S.P., Budhavant, K., Meshram, D.C., Safai, P.D. (2019) Anthropogenic fine aerosols dominate over the Pune region, Southwest India. Meteorology and Atmospheric Physics, 131, 1497-1508.
*24.*	Gocheva-Ilieva, S.G., Stoimenova, M.P. (2018) PM₁₀ Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria. World Academy of Science, Engineering and Technology. International Journal of Environmental and Ecological Engineering, 12(9), 572-577.
*25.*	Gogikar, P., Tyagi, B., Gorai, A.K. (2019) Seasonal prediction of particulate matter over the steel city of India using neural network models. Modeling Earth System and Environment, 5, 227-243.
*26.*	Goyal, P., Chan, A.T., Jaiswal, N. (2006) Statistical models for the prediction of respirable suspended particulate matter in urban cities. Atmospheric Environment, 40(11), 2068-2077.
*27.*	Grzesiak, W., Zaborski, D. (2012) Examples of the use of data mining methods in animal breeding. Data mining applications in engineering and medicine. Adem Karahoca, IntechOpen, Croatia. 2012; pp. 303-324. Available online: https://www.intechopen.com/books/data-mining-applications-in-engineering-andmedicine/examples-of-the-use-ofdata-mining-methods-in-animal-breeding (accessed on 21 July, 2020).
*28.*	Gummeneni, S., Yusup, Y.B., Chavali, M., Samadi, S.Z. (2011) Source apportionment of particulate matter in the ambient air of Hyderabad city, India. Atmospheric Research, 101(3), 752-764.
*29.*	Gurjar, B.R., Jain, A., Sharma, A., Agarwal, A., Gupta, P., Nagpure, A.S., Lelieveld, J. (2010) Human health risks in megacities due to air pollution. Atmospheric Environment, 44(36), 4606-4613.
*30.*	Guttikunda, S.K. (2017) Clearing the Air Seminar Series, ‘Filling the Knowledge Gap on Air Quality in Indian Cities’ Initiative on Climate, Energy and Environment (ICEE) at the Centre for Policy Research (CPR). Delhi, 4 December 2017.
*31.*	Guttikunda, S.K., Nishadh, K.A., Gota, S., Singh, P., Chanda, A., Jawahar, P., Asundi, J. (2019) Air quality, emissions, and source contributions analysis for the Greater Bengaluru region of India. Atmospheric Pollution Research, 10(3), 941-953.
*32.*	Hooyberghs, J., Mensink, C., Dumont, G., Fierens, F., Brasseur, O. (2005) A neural network forecast for daily average PM₁₀ concentrations in Belgium. Atmospheric Environment, 39(18), 3279-3289.
*33.*	Jena, S., Singh, G. (2017) Human health risk assessment of airborne trace elements in Dhanbad, India. Atmospheric Pollution Research, 8(3), 490-502.
*34.*	Jiang, P., Dong, Q., Li, P. (2017) A novel hybrid strategy for PM_2.5 concentration analysis and prediction. Journal of Environmental Management, 196, 443-457.
*35.*	Jinsart, W., Sripraparkorn, C., Siems, S.T., Hurley, P.J., Thepanondh, S. (2010) Application of the air pollution model (TAPM) to the urban air shed of Bangkok, Thailand. International Journal of Environment and Pollution (IJEP), 42(1/2/3), 68-84.
*36.*	Kalaiarasan, G., Balakrishnan, R.M., Sethunath, N.A., Manoharan, S. (2018) Source apportionment studies on particulate matter (PM₁₀ and PM_2.5) in ambient air of urban Mangalore, India. Journal of Environmental Management, 217, 815-824.
*37.*	Kavuri, N.C., Paul, K.K. (2013) Chemical Characterization of Ambient PM₁₀ Aerosol in a Steel City, Rourkela, India. Research Journal of Recent Sciences, 2(1), 32-38.
*38.*	Kaur, M., Mandal, A. (2020) PM_2.5 Concentration Forecasting using Neural Networks for Hotspots of Delhi, 2020. International Conference on Contemporary Computing and Applications (IC3A), Lucknow, India, 5-7 February, pp. 40-43.
*39.*	Kottur, S.V., Mantha, S.S. (2015) An integrated model using artificial neural network (ANN) and kriging for forecasting air pollutants using meteorological data. International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), 4(1), 146-152.
*40.*	Kumari, P.R., Avisetty, R.V.S.D.S.P., Akkala, P., Subash, K.V.V., Manideep, K.S., Bojja, P., Aruna, B. (2019) Prediction and Estimation of PM₁₀ and SO₂ Concentrations in the Ambient Air At Vijayawada Station using Artificial Neural Networks Computing. International Journal of Recent Technology and Engineering, 7(6C2), 790-793.
*41.*	Lawrence, A., Fatima, N. (2014) Urban air pollution & its assessment in Lucknow City - The second largest city of North India. Science of the Total Environment, 488-489, 447-455.
*42.*	Masood, A., Ahmad, K. (2020) A model for particulate matter (PM_2.5) prediction for Delhi based on machine learning approaches. Procedia Computer Science, 167, 2101-2110.
*43.*	Mckenney, D.W., Pedlar, J.H. (2003) Spatial models of site index based on climate and soil 701 properties for two boreal tree species in Ontario, Canada. Forest Ecology and Management, 175, 497-507.
*44.*	Mishra, D., Goyal, P., Upadhyay, A. (2015) Artificial intelligence- based approach to forecast PM_2.5 during haze episodes: A case study of Delhi, India. Atmospheric Environ ment, 102, 239-248.
*45.*	Moisen, G.G., Frescino, T.S. (2002) Comparing five modelling techniques for predicting forest characteristics. Ecological Modelling, 157(2-3), 209-225.
*46.*	Murari, V., Kumar, M., Barman, S.C., Banerjee, T. (2015) Temporal variability of MODIS aerosol optical depth and chemical characterization of airborne particulates in Varanasi, India. Environmental Science and Pollution Research, 22, 1329-1343.
*47.*	Myllyvirta, L., Dahiya, S., Sivalingam, N. (2016) Out of sight: how coal burning advances India’s air pollution crisis. Greenpeace Environment Trust, Bengaluru; Available online: http://www.greenpeace.org/india/Global/india/cleanairnation/Reports/Out%20of%20Sight.pdf (accessed on 26, February 2020).
*48.*	Nadeem, I., Ilyas, A.M., Uduman, P.S.S. (2020) Analyzing and Forecasting Ambient Air Quality Of Chennai City In India. Geography Environment Sustainability, 13(3).
*49.*	Nagendra, S.M.S., Khare, M. (2006) Artificial neural network approach for modelling nitrogen dioxide dispersion from vehicular exhaust emissions. Ecological Modelling, 190(1- 2), 99-115.
*50.*	Ostro, B., Chestnut, L., Vichit-Vadakan, N., Laixuthai, A. (1999) The impact of particulate matter on daily mortality in Bangkok, Thailand. Journal of the Air and Waste Management Association, 49(9), 100-107.
*51.*	Pant, P., Lal, R.M., Guttikunda, S.K., Russell, A.G., Nagpure, A.S., Ramaswami, A., Peltier, R.E. (2019) Monitoring particulate matter in India: recent trends and future outlook. Air Quality Atmosphere and Health, 12(1), 45-58.
*52.*	Pipal, A.S., Jan, R., Satsangi, P., Tiwari, S., Taneja, A. (2014) Study of Surface Morphology, Elemental Composition and Origin of Atmospheric Aerosols (PM_2.5 and PM₁₀) over Agra, India. Aerosol and Air Quality Research, 14, 1685-1700.
*53.*	Prakash, A., Kumar, U., Kumar, K., Jain, V.K. (2011) A waveletbased neural network model to predict ambient air pollutants’ concentration. Environmental Modeling and Assessment, 16(5), 503-517.
*54.*	Ravindra, K., Rattan, P., Mor, S., Aggarwal, A.N. (2019) Generalized additive models: Building evidence of air pollution, climate change and human health. Environment International, 132, 104987.
*55.*	Roy, D., Singh, G., Seo, Y.C. (2019) Carcinogenic and non-carcinogenic risks from PM₁₀ and PM_2.5-bound metals in a critically polluted coal mining area. Atmospheric Pollution Research, 10(6), 1964-1975.
*56.*	Shahraiyni, H.T., Sodoudi, S. (2016) Statistical Modeling Approaches for PM₁₀ Prediction in Urban Areas; A Review of 21st-Century Studies. Atmosphere, 7, 15.
*57.*	Sharma, M., Maloo, S. (2005) Assessment of ambient air PM₁₀ and PM_2.5 and characterization of PM₁₀ in the city of Kanpur, India. Atmospheric Environment, 39(33), 6015-6026.
*58.*	Sharma, S., Nayak, H., Lal, P. (2015) Post-Diwali morbidity survey in a resettlement colony of Delhi. Indian Journal of Burns, 23(1), 76-80.
*59.*	Shubhankar, B., Ambade, B. (2016) Chemical characterization of carbonaceous carbon from industrial and semi urban site of eastern India. Springer Plus, 5, 837.
*60.*	Singh, D.P., Gadi, R., Mandal, T.K. (2011) Characterization of particulate-bound polycyclic aromatic hydrocarbons and trace metals composition of urban air in Delhi, India. Atmospheric Environment, 45, 7653-7663.
*61.*	Singh, K.P., Gupta, S., Kumar, A., Shukla, S.P. (2012) Linear and nonlinear modeling approaches for urban air quality prediction. Science of the Total Environment, 426, 244-255.
*62.*	Singh, K.P., Gupta, S., Rai, P. (2013) Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment, 80, 426-437.
*63.*	Slini, T., Kaprara, A., Karatzas, K., Moussiopoulos, N. (2006) PM₁₀ forecasting for Thessaloniki, Greece. Environmental Modelling and Software, 21(4), 559-565.
*64.*	Sudheer, A.K., Aslam, M.Y., Upadhyay, M., Rengarajan, R., Bhushan, R., Rathore, J.S., Singh, S.K., Kumar, S. (2016) Carbonaceous aerosol over semi-arid region of western India: Heterogeneity in sources and characteristics. Atmospheric Research, 178-179, 268-278.
*65.*	Tikhe Shruti, S., Khare, K.C., Londhe, S.N. (2013) Forecasting criteria air pollutants using data driven approaches; An Indian case study. Journal Of Environmental Science, Toxicology And Food Technology (IOSR-JESTFT), 3(5), 1-8.
*66.*	Tiwari, S., Bisht, D.S., Srivastava, A.K., Pipal, A.S., Taneja, A., Srivastava, M.K., Attri, S.D. (2014) Variability in atmospheric particulates and meteorological effects on their mass concentrations over Delhi, India. Atmospheric Research, 145-146, 45-56.
*67.*	Tiwari, S., Dumka, U.C., Gautam, A.S., Kaskaoutis, D.G., Srivastava, A.K., Bisht, D.S., Chakrabarty, R.K., Sumlin, B.J., Solm, F. (2017) Assessment of PM_2.5 and PM₁₀ over Guwahati in Brahmaputra River Valley: Temporal evolution, source apportionment and meteorological dependence. Atmospheric Pollution Research, 8, 13-28.
*68.*	Ul-Saufie, A.Z., Yahaya, A.S., Ramli, N.A., Rosaida, N., Hamid, H.A. (2013) Future daily PM₁₀ concentrations forecasting by combining regression models and feedforward backpropagation models with principal component analysis (PCA). Atmospheric Environment, 77, 621-630.
*69.*	Vemuri, V. (1988) Artificial neural networks: theoretical concepts; IEEE Computer Society Press Washington DC, United States, pp. 145; ISBN: 978-0-8186-0855-1.
*70.*	Vlachogianni, A., Karppinen, A., Kassomenos, P., Karakitsios, S., Kukkonen, J. (2011) Evaluation of a multiple regression model for the forecasting of the concentrations of NO_x and PM₁₀ in Athens and Helsinki. Science of the Total Environment, 409(8), 1559-1571.
*71.*	Wang, W. (2016) Progress in the impact of polluted meteorological conditions on the incidence of asthma. Journal of Thoracic Disease, 8(1), E57-E61.
*72.*	WHO (2018) Concentration occurrence or they should stay away from the high-risk areas. WHO, Geneva. Available online: http://www.who.int/phe/health_topics/outdoorair/? (accessed on 10 March 2020).
*73.*	Willmott, C.J., Matsuura, K., Robeson, S.M. (2009) Ambiguities inherent in sums-of-squares-based error statistics. Atmospheric Environment, 43(3), 749-752.
*74.*	Yadav, M., Soni, K., Soni, B.K., Singh, N.K., Bamniya, B.R. (2019) Source apportionment of particulate matter, gaseous pollutants, and volatile organic compounds in a future smart city of India. Urban Climate, 28, 100470.
*75.*	Yadav, S., Satsangi, P.G. (2013) Characterization of particulate matter and its related metal toxicity in an urban location in southwest India. Environmental Monitoring and Assessment, 185, 7365-7379.
*76.*	Yadav, V., Nath, S. (2019) Novel hybrid model for daily prediction of PM₁₀ using principal component analysis and artificial neural network. International Journal of Environmental Science and Technology, 16(6), 2839-2848.
*77.*	Yoo, K., Yoo, H., Lee, J.M., Shukla, S.K., Park, J. (2018) Classification and regression tree approach for prediction of potential hazards of urban airborne bacteria during Asian dust events. Scientific Reports, 8(11823).

Table S1.
The number of times violation of WHO and NAAQS standards by Indian cities for PM₁₀ concentration.

Serial no.	Cities	PM₁₀ (in μg m^-3)	Violation
Serial no.	Cities	PM₁₀ (in μg m^-3)	WHO	NAAQS
1	Adityapur	165	8.25	2.75
2	Agra	295	8.75	2.92
3	Ahmedabad	108.3	5.42	1.81
4	Amritsar	252.22	5.04	2.52
5	Bangaluru	349.8	17.49	5.83
6	Bathinda	204	4.08	2.04
7	Chandigarh	151	3.03	1.51
8	Delhi	182	3.64	1.82
9	Dhanbad	216	10.8	3.60
10	Fatehgarh	197	3.94	1.97
11	Guwahati	90.7	4.54	1.51
12	Hyderabad	174.4	8.72	2.91
13	Jharia	333.7	16.69	5.56
14	Jodhpur	180	3.6	1.8
15	Kanpur	277	2.28	2.77
16	Kolkata	445	22.25	7.42
17	Lucknow	123	2.46	1.23
18	Mangalore	101.8	5.09	1.07
19	Mumbai	54.4	2.72	0.91
20	Pune	113	3.38	1.14
21	Raipur	387.29	19.56	6.45
22	Rohtak	186.09	3.72	1.86
23	Rourkella	127.26	6.38	2.13
24	Shimla	93.9	4.7	1.57
25	Sirsa	203	4.06	2.03
26	Sonitpur	213.67	4.27	2.14
27	Udaypur	128.34	2.57	2.14
28	Varanasi	176.1	8.81	2.94

Table S2.
Different data driven predictive techniques used for PM forecasting in Indian context.

Author (year)	Location (Type)	Method	Predictor variables	Target	Remarks
Prakash et al. (2011)	Delhi (Tier I city)	Wavelet and RNN (Recurrent Neural Network) combination	CO, NO₂, NO, O₃, SO₂ & PM_2.5	CO, O₃, NO₂, NO, SO₂ & PM_2.5	Forecast performance was reasonably good.
Singh et al. (2012)	Lucknow (Tier II city)	Partial least squares regression (PLSR), multivariate polynomial regression (MPR) and ANN	T, RH, WS, SPM, NO₂, SO₂	RSPM, SO₂, & NO	MPR and ANNs performed better.
Singh et al. (2013)	Lucknow (Tier II city)	Single Decision Tree (SDT), Decision Tree Forest (DTF) and Decision Tree Boost (DTB) vs. Support Vector Machine (SVM)	Air quality & meteorological parameter	AQI and Combined AQI	DTF and DTB outperformed the SVM.
Kottur and Mantha (2015)	Mumbai (Tier I city)	ANN and Kriging combination	T, RH, WS, WD, AP, NO_x, SO_x, RSPM	NO_x, SO_x and RSPM	ANN and Kriging performed satisfactorily.
Mishra et al. (2015)	Delhi (Tier I city)	Artificial intelligence-based Neuro-Fuzzy (NF) techniques compared MLR, and ANN	CO, O₃, NO₂, SO₂, PM_2.5, AP, T, WS, WD, RH, V, DP	PM_2.5	NF model is better than ANN and MLR models.
Gogikar et al. (2018)	Rourkela (Tier II city)	WMLPNN (wavelet based MLP), WRNN (wavelet-based RNN), multi-layer perceptron feed forward neural network (MLPNN) and (RNN)	T, RH, BLH, SP, WD, WS	PM_2.5, PM₁₀	WMLPNN model performed better.
Yadav and Nath (2019)	Varanasi (Tier II city)	PCA- ANN (MLP) and MLR	PM_2.5, NO, Benzene and VWS for PCA for ANN. SR, WS & AP for MLR	PM₁₀	hybrid PCA-ANN model gives a better prediction.
Masood and Ahmad (2020)	Delhi (Tier I city)	SVM and ANN	PM_2.5, SO₂, CO, NO, NO_x, C₇H₈, NO₂, VWS, WS, WD, T, RH, SR	PM_2.5	ANN exhibited better result.
Agarwala et al. (2020)	Delhi (Tier I city)	ANN	Meteorological variables	PM₁₀, PM_2.5, NO₂, and O₃	O₃ predictionsare better than PM.
Nadeem et al. (2020)	Chennai (Tier I city)	ARMA/ARIMA modelling	PM₁₀, SO₂ & NO₂	PM₁₀, SO₂ and NO₂	Forecasting efficiency can be improved.
Kaur and Mandal (2020)	Delhi (Tier I city)	ANN of four types FFBP, RNN, Elman and NARX (non-linear autoregressive network with exogenous input)	PM_2.5, T, WS, WD, RH, SR	PM_2.5	NARX model outperforms others.
Bhardwaj and Pruthi (2020)	Delhi. (Tier I city)	ANFIS (Adaptive-Neuro Fuzzy Inference System), WANFIS (wavelet ANFIS), WANFIS-GA (WANFIS genetic algorithm), WANFIS-PSO (WANFIS particle swarm optimization)	PM_2.5	PM_2.5	WANFIS-PSO performed better.
This study	Guwahati (Tier II city)	MLR- ANN (MLP) and CART	PM₁₀, RF, T, RH, WS, NO₂, SO₂, CO	PM₁₀	ANN performed better.

Abbreviations: Temperature (T), Relative humidity (RH), Wind speed (WS), Wind direction (WD), Atmospheric pressure (AP), Nitrogen dioxide (NO₂), Sulfur dioxide (SO₂), Respirable suspended particulates matter (RSPM), Nitrogen oxides (NO_x) and Sulfur oxides (SO_x), Ozone(O₃), Visibility (V), Dew point (DP), Boundary layer height (BLH), Surface pressure (SP), Solar radiation (SR)

Table S3.
Monitoring stations and their UTM coordinates.

Code	Location	Monitoring type	Latitude	Longitude	Area type
STN1_603	Boragaon, IASST Campus	Non PM_2.5	26.11635	91.68338	Residential
STN2_596	Khanapara, Central Diary	Non PM_2.5	26.0831	91.8171	Residential
STN3_519	Gopinath Nagar, ITI Building	Non PM_2.5	26.160962	91.752542	Residential
STN4_541	Santipur, Prajyotish College	Non PM_2.5	26.165391	91.7276	Residential
STN5_602	Guwahati University campus	Non PM_2.5	26.15793	91.66312	Residential
STN6_193	Bamunimaidan, PCBA HQ	PM_2.5	26.185165	91.788334	Residential

Table S4.
Correlation coefficient for PM₁₀ and gaseous variables and relative humidity (RH).

	PM₁₀	RH	NO₂	SO₂	CO
PM₁₀	1
RH	-.355**	1
NO₂	.231**	-0.045	1
SO₂	.128**	-.121**	.477**	1
CO	.061*	-0.024	-0.011	0.027	1

**p<0.01; *p<0.05

Table S5.
Performance indicators for ANN MLP validation model, station 6 (Jan-Mar, 2019).

Performance indicators	Values
NAE (Net Absolute Error)	0.07
MAE (Mean Absolute Error)	13.02
MSE (Mean Squared Error)	287.95
RMSE (Root Mean Square Error)	16.97
IA (Index of Agreement)	0.89
R² (Coefficient of Determination)	0.69

Fig. S1.
Observations and prediction of the MLR models in scatter plots.

Fig. S2.
Observations and prediction of the ANN models in scatter plots.

Fig. S3.
Decision tree structures from CART for station 1 to 6.

Fig. S4.
PM₁₀ Prediction using ANN MLP for station 6 ( Jan-Mar, 2019), for Guwahati city.

The Editorial Office : Asian Association for Atmospheric Environment
AJAE (Asian Journal of Atmospheric Environment)
Homepage : http://www.asianjae.org
Submission platform : http://mc03.manuscriptcentral.com/ajae

KOSAE (Korean Society for Atmospheric Environment)
124, Sajik-ro, Jongno-gu, Seoul, Korea
Tel : +82-2-387-1400, 0242 / FAX : +82-2-387-1881
E-mail : webmaster@kosae.or.kr / Homepage : http://www.kosae.or.kr

CSES•CSAE (Association of Atmospheric Environment of Chinese Society for Environmental Sciences)
No. 54, Hongnian Village, Haidian District, Beijing, China
Tel : +86-10-82211021
E-mail : cses@chinacses.org / Homepage : http://www.chinacses.org

JSAE (Japan Society for Atmospheric Environment)
358-5, Yamabuki-cho, Shinjuku-ku, Tokyo 162-0801, Japan
Tel : +81-3-6824-9392 / FAX : +81-3-5227-8631
E-mail : jsae-post@bunken.co.jp / Homepage : http://www.jsae-net.org

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

The Editorial Office : Asian Association for Atmospheric Environment
AJAE (Asian Journal of Atmospheric Environment)
Homepage : http://www.asianjae.org
Submission platform : http://mc03.manuscriptcentral.com/ajae

KOSAE (Korean Society for Atmospheric Environment)
124, Sajik-ro, Jongno-gu, Seoul, Korea
Tel : +82-2-387-1400, 0242 / FAX : +82-2-387-1881
E-mail : webmaster@kosae.or.kr / Homepage : http://www.kosae.or.kr

CSES•CSAE (Association of Atmospheric Environment of Chinese Society for Environmental Sciences)
No. 54, Hongnian Village, Haidian District, Beijing, China
Tel : +86-10-82211021
E-mail : cses@chinacses.org / Homepage : http://www.chinacses.org

JSAE (Japan Society for Atmospheric Environment)
358-5, Yamabuki-cho, Shinjuku-ku, Tokyo 162-0801, Japan
Tel : +81-3-6824-9392 / FAX : +81-3-5227-8631
E-mail : jsae-post@bunken.co.jp / Homepage : http://www.jsae-net.org