Predicting PM_{2.5} Concentrations Using Artificial Neural Networks and Markov Chain, a Case Study Karaj City
Abstract
The forecasting of air pollution is an important and popular topic in environmental engineering. Due to health impacts caused by unacceptable particulate matter (PM) levels, it has become one of the greatest concerns in metropolitan cities like Karaj City in Iran. In this study, the concentration of PM_{2.5} was predicted by applying a multilayer percepteron (MLP) neural network, a radial basis function (RBF) neural network and a Markov chain model. Two months of hourly data including temperature, NO, NO_{2}, NO_{x}, CO, SO_{2} and PM_{10} were used as inputs to the artificial neural networks. From 1,488 data, 1,300 of data was used to train the models and the rest of the data were applied to test the models. The results of using artificial neural networks indicated that the models performed well in predicting PM_{2.5} concentrations. The application of a Markov chain described the probable occurrences of unhealthy hours. The MLP neural network with two hidden layers including 19 neurons in the first layer and 16 neurons in the second layer provided the best results. The coefficient of determination (R^{2}), Index of Agreement (IA) and Efficiency (E) between the observed and the predicted data using an MLP neural network were 0.92, 0.93 and 0.981, respectively. In the MLP neural network, the MBE was 0.0546 which indicates the adequacy of the model. In the RBF neural network, increasing the number of neurons to 1,488 caused the RMSE to decline from 7.88 to 0.00 and caused R^{2} to reach 0.93. In the Markov chain model the absolute error was 0.014 which indicated an acceptable accuracy and precision. We concluded the probability of occurrence state duration and transition of PM_{2.5} pollution is predictable using a Markov chain method.
Keywords:
Air pollution, PM_{2.5} concentration prediction, Artificial neural network, Markov chain1. Introduction
Prediction of particulate matter (PM) is one of the important issues in the control and management of pollutants in the air. Particulate matter is the term used for a mixture of solid particles and liquid droplets found in the air (Dong et al., 2012). The health effects of exposure to fine particulate matter are increasing the risk of death from lung cancer, pulmonary illness (e.g., asthma), chronic bronchitis, heart attack and cardiovascular disease (Deng et al., 2013; Goss et al., 2004; Slaughter et al., 2003).
The US Environmental Protection Agency (EPA) standards divide air quality into three categories of PM_{2.5} pollution (0-12 μg/m^{3} hour concentrations as good quality, 12.1 to 55.4 μg/m^{3} as sensitive quality, and 55.5 plus μg/m^{3} unhealthy quality). By 2020, the benefits of reductions in fine particles and ozone are estimated to be $113 billion annually (Dong et al., 2012). Availability of accurate and sufficient data for forecasting future emissions helps planning and control of air pollution in air quality management (AQM); therefore, forecasting air pollution for AQM in urban areas is essential. Several techniques have been developed for the prediction of particulate matter (PM) concentrations. Approaches for the PM prediction can be classified into five categories: (1) empirical models, (2) fuzzy logic-based systems, (3) simulation models, (4) data driven statistical models, and (5) model-driven statistical learning methods (Dong et al., 2012).
Using air pollution modeling software always has several limitations. In such models, several errors and inaccurate results may be caused because many factors are not considered, (Harsham et al., 2008; Hanna et al., 2007; Caputo et al., 2003). The assessment of time series changes and their analysis using mathematical methods such as Markov chain model and artificial neural network (ANN) methods and using available data is appropriate and reliable methods, and usually have fewer errors.
Zickus et al. (2002), Owega et al. (2006), Kurt et al. (2008), Kukkonen et al. (2003), Niska et al. (2005), Slini et al. (2006), Voukantsis et al. (2011), Feng and Moustris (2013) applied ANN models to predict air quality parameters. Li et al. (2015) applied ANN methods to simulate PM_{2.5} and PM_{10}. Their results indicated that ANNs performed better than other methods and recommended this method as a reliable and accurate model.
The Markov chain model is a useful mathematical method in reliability research (Wang and Liu, 2012). Several studies have been conducted on the use of Markov models to predict air pollution in the world; Romanof (1982), Nicas (2000), Shamshad (2005). Chung and AitSahlia (2003), Sun et al. (2013) applied a Markov chain to determine the probability of various pollution scenarios of PM_{2.5}. Their results proved the workability of this method in modelling PM_{2.5}.
The main objective of this study was to predict PM_{2.5} concentration and quality of air in Karaj City, Iran using past data on air pollution. We applied neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF), and Markov Chain model. The MLP, RBF and Markov chain are independent models. The MLP and RBF belong to a family of artificial intelligence neural network, which they aim to predict future physical quantities of PM_{2.5} in the city. The input parameters used in this study were temperature and hourly air concentrations of NO, NO_{2}, NO_{x}, CO, SO_{2} and PM_{10}. Meanwhile, the Markov chain model was used to predict the probability of occurrence of PM_{2.5} in different periods and to indicate air quality of the city in three forms, including good quality, sensitive and unhealthy conditions. The models were developed and tested for hourly data for two months in the Karaj Metro area, and the feasibility was discussed in this paper.
1. 1 The Study Area
Karaj City is the capital of Alborz Province, Iran. Its population is about 1.97 million, making it the fourthlargest city in Iran after Tehran, Mashhad and Esfahan. It is situated 20 kilometers west of Tehran, at the foothills of the Alborz Mountains. Its coordinates are 50 degrees, 55 minutes and 15 seconds east longitude and 35 degrees, 45 minutes and 50 seconds north latitude. Its area is about 858 km^{2}. The annual rainfall of the area is about 261 mm, and the mean annual temperature is between 5 and 13 degree centigrade (Ilanloo, 2011). A monitoring station was considered for this work which is located in the main metro station of the Karaj to Tehran metro (Fig. 1).
2. Materials and Methods
Availability of accurate and sufficient data to train an ANN is very significant. The power of ANNs to respond to new problems depends on the primary data to some extent. In this study, air quality parameters were hourly temperature, SO_{2}, PM_{10}, PM_{2.5}, CO, NO, NO_{2} and NO_{x}. About 1,488 data were available (62 days), of which 1,300 were applied to train the ANN and Markov chain, and the rest of the data were used to compare the simulation data with observed data which was monitored by the Karaj department of environment. Table 1 presents the statistical summary of hourly air quality information from March 21, 2015 to May 20, 2015.
To allow better predictions, input and output data in ANNs were normalized in some iterations. Equation 1 was used to normalize data in this study. This function adjusts data in a range of 0 to 1 (Zurada, 1992).
$$${N}_{i}=\left|\frac{{X}_{i}-{X}_{min}}{{X}_{max}-{X}_{min}}\right|$$$ | (1) |
Where N_{i} and X_{i} are scaled and the observed value of the parameters; X_{min} and X_{max} represent the lowest and highest amount of a series of the parameters.
2. 1 Artificial Neural Networks (ANNs)
An artificial neural network is an idea to process information that is inspired by biological nervous systems and processes information like the human brain. The overall performance of ANN can be observed in Equations 2 and 3 (Hambli, 2011; Haykin, 1999).
$$${y}_{i}^{m}=f\left({v}_{i}^{m}\right)$$$ | (2) |
$$${v}_{i}^{m}=\sum _{j=1}^{L}{w}_{ji}^{m-1}{y}_{j}^{m-1}+{b}_{i}^{m}$$$ | (3) |
Where $$ {y}_{i}^{m}$$ and $$ {v}_{i}^{m}$$ are the input and output of i-th neuron in m-th of the hidden layer, f is activation function. L is the number of connections to previously hidden layers and $$ {b}_{i}^{m}$$ represent the weight and bias.
Several different types of ANNs are available. We used Radial Basis function (RBF) and Multilayer Perceptron (MLP) neural networks in this study. We developed all the programs with MATLAB software (R2012a) produced by the Math Work Company.
2. 2 MLP Neural Network Structure
MLP neural networks have the ability to determine the number of hidden layers, the number of neurons in each layer and transfer functions used in the layers. These functions can be log sigmoid functions, the one most usually applied in ANN (according to Eq. 4) or tan. Sigmoid functions (according to Eq. 5).
$$$f\left({v}_{i}^{m}\right)=\frac{2}{1+\mathit{exp}\left(-2{v}_{i}^{m}\right)}-1$$$ | (4) |
$$$f\left({v}_{i}^{m}\right)=\frac{1}{1+\mathit{exp}\left(-\theta {v}_{i}^{m}\right)}$$$ | (5) |
Where θ is the slope of the transfer function (θ=0.9).
Several algorithms for training MLP networks exist. In the simplest implementation of these networks, weight and bias are updated in the direction in which efficiency decreases (the opposite direction of the slope). Equation 6 illustrates a repeat of this algorithm (Rumelhart and McClelland, 1986).
$$${x}_{k+1}={x}_{k}-{\alpha}_{k}{g}_{k}$$$ | (6) |
Where x_{k} is weight and bias vector; g_{k} is the slope of the function and a_{k} is the learning rate. Fig. 2 indicates the MLP neural network used in this study.
Table 2 indicates the parameters of the MLP network design, where the parameter ‘show’ indicates the number of iterations after which the training status is displayed; α is the speed of learning; the goal is the target error rate. β is the coefficient of momentum and epochs are the frequency of training. Training stop when it reaches the number of levels determined in epochs, or when the amount of the performance function is less than the goal parameter. The learning rate is multiplied by the slope value and used to update weights and bias. If the value of this parameter is too large, the training process will not have enough stability and if it is too small, the algorithm will need a long time to converge. Momentum ratio (β) receives a value between 0 and 1. When the momentum ratio is zero, weight changes are only from the performance function slope and when it is one, weight changes are based on previous weight changes and the slope is ignored.
First of all, in the performance of MLP neural networks, the weights are selected randomly and then are applied to the input of the neural network along with randomized bias. After that output prediction are compared with observed output data, and finally the mean square error (MSE) is calculated between the observed data and predicted data. If the error value is less than the desired error set of the network, training stops, otherwise, weights and bias will be changed to reduce errors.
2. 3 RBF Neural Network
Radial functions are simply a class of functions. In principle they could be employed in any sort of model, linear or nonlinear. Fig. 3 presents a RBF network, each of n components of the input vector x feed forward to m basis functions whose outputs are linearly combined with weights $$ {\left\{{w}_{j}\right\}}_{j=1}^{m}$$ into the network output f(x) (Orr, 1996).
Compared to MLP neural networks, RBF neural networks need less time to design and more neurons are necessary. When there are many training vectors, these networks have the best performance (Cohen and Intrator, 2002). The procedure in these networks is: the training process continues by increasing the number of hidden layer neurons until the performance function reaches the target value or until it reaches the maximum number of neurons (the number of data).
The RBF neural networks have an easy architecture. Their structure includes an input layer, a single hidden layer, and an output layer, which at each output node makes available a linear combination of the outputs of the hidden-layer nodes. Training an RBF is comprised of two steps. First, the basic functions are established using an algorithm to cluster data in the training set. Kohohen self-organizing maps (SOMs) or a k-means clustering algorithm has been most typically used. Kohohen SOMs (Kohohen, 1984) are a form of ‘selforganizing’ neural network that learn to differentiate patterns within input data. A SOM will, consequently, cluster an input data according to perceived patterns without having to be given a corresponding output response. K means clustering and organizing all objects into a predefined number of groups by minimizing the total squared Euclidean distance for every object regarding its nearest cluster Centre. Nevertheless, other techniques, such as orthogonal least squares and maxi min algorithms, have also been applied (Song, 1996). Next, the weights linking the hidden and the output layer are calculated directly using simple matrix inversion and multiplication. The direct calculation of weights in an RBF makes it far quicker to train than an equivalent MLP (Dawson and Wibly, 2001).
2. 4 Model Efficiency
To determine the amount of error in predicting PM_{2.5} and to evaluate the performance of the models, we applied a Root Mean Squared Error (RMSE) and a Mean Bias Error (MBE) which are indicated in Equations 7 and 8. Also, we applied the Nash-Sutcliffe Efficiency Coefficient (E), coefficient of determination (R^{2}) and the Index of Agreement (IA), between the observed and predicted data to illustrate the validity of the model (Feng et al., 2015; Voukantsis et al., 2011; Krause et al., 2005).
$$$RMSE=\sqrt{\frac{1}{n}\sum _{i=1}^{n}{\left(P-M\right)}^{2}}$$$ | (7) |
$$$MBE=\frac{1}{n}\sum _{i=1}^{n}\left|\frac{P-M}{M}\right|$$$ | (8) |
$$$E=1-\frac{{\sum}_{i=1}^{n}{\left(\frac{P-M}{M}\right)}^{2}}{{\sum}_{i=1}^{n}{\left(\frac{M-\overline{M}}{M}\right)}^{2}}$$$ | (9) |
$$${R}^{2}={\left(\frac{\text{\Sigma}\left(M-\overline{M}\right)\left(P-\overline{P}\right)}{\sqrt{\text{\Sigma}{\left(M-\overline{M}\right)}^{2}\text{\Sigma}{\left(P-\overline{P}\right)}^{2}}}\right)}^{2}$$$ | (10) |
$$$IA=1-\frac{\text{\Sigma}{\left(P-M\right)}^{2}}{\text{\Sigma}{\left(\left|P-\overline{P}\right|+\left|M-\overline{M}\right|\right)}^{2}}$$$ | (11) |
Where P and M are the predicted and the observed values of PM_{2.5} at the time t, respectively, and $$ \overline{M}$$ and $$ \overline{P}$$ are the average of predicted and observed values, respectively and n is the number of data.
2. 5 Markov Chain
Several mathematical methods have been used to measure the concentration of air pollutants such as Markov chain model, which was used in this study. Markov chain is a mathematical method for modeling of probabilistic processes. Two features characterize a Markov chain: (a) state space and (b) level. If we define the Karaj weather as a system, its state space (S) in a given hour will be one of three positions in Eq. 12 (Chung and AitSahlia, 2003).
$$$\text{S=}\left\{\text{g,s,u}\right\}$$$ | (12) |
Where g is the duration of the day with good air quality. s is sensitive hours and u represents the duration of the day with unhealthy air quality. A Markov chain level specifies that the current state of the system depends on how many of the previous states. To determine the most suitable level in a Markov chain several tests are available. We used the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) tests (Eq. 13 and 14). These tests were performed for different levels and the most suitable criteria were selected based on the lowest AIC or BIC. The AIC and the BIC tests are based on likelihood functions and the likelihood values of a Markov chain, which from zero likelihood (L_{0}), the first (L_{1}), second (L_{2}) and third (L_{3}) according to equations 15 to 18 (Taylor and Karlin, 1998).
$$$\text{AIC}\left(\text{m}\right)\text{=-2}{\text{L}}^{\text{m}}\text{+2}{\text{S}}^{\text{m}}\left(\text{S-1}\right)$$$ | (13) |
$$$\text{BIC}\left(\text{m}\right)\text{=-2}{\text{L}}^{\text{m}}\text{+2}{\text{S}}^{\text{m}}\text{Ln}\left(\text{S-1}\right)$$$ | (14) |
$$${L}_{0}=\sum _{j=0}^{S-1}{n}_{j}\text{ln}\left({\widehat{P}}_{j}\right)$$$ | (15) |
$$${L}_{1}=\sum _{i=0}^{S-1}\sum _{j=0}^{S-1}{n}_{ij}\text{ln}\left({\widehat{P}}_{ij}\right)$$$ | (16) |
$$${L}_{2}=\sum _{h=0}^{S-1}\sum _{i=0}^{S-1}\sum _{j=0}^{S-1}{n}_{hij}\text{ln}\left({\widehat{P}}_{hij}\right)$$$ | (17) |
$$${L}_{2}=\sum _{g=0}^{S-1}\sum _{h=0}^{S-1}\sum _{i=0}^{S-1}\sum _{j=0}^{S-1}{n}_{ghij}\text{ln}\left({\widehat{P}}_{ghij}\right)$$$ | (18) |
Where S is the number of states, m is the order of the Markov chain; n and $$ \widehat{P}$$ are the transition count (number of data) and estimated transition probabilities, respectively. The n_{ij} is the observed transition count for a binary time series. For example, the transition count n_{00} specifies the number of consecutive pairs 0’s in time series (Wilks, 2006). More information is available by Wilks, 2006.
In this study, according to the results of the AIC and BIC tests, a Markov chain was defined from the first level. Equation 19 indicates its mathematical expression (Logofet and Lensnaya, 2000).
$$$\text{Pr}\left\{{\text{X}}_{\text{t}}\text{|}{\text{X}}_{\text{t-1}}\text{,}{\text{X}}_{\text{t-2}}\text{,\u2026,}{\text{X}}_{\text{1}}\right\}\text{=}\text{Pr}\left\{{\text{X}}_{\text{t}}\text{|}{\text{X}}_{\text{t-1}}\right\}$$$ | (19) |
According to equation 18, the state of a variable in time t, X_{t}, only depends on its state at the time t-1, X_{t-1}, and it does not depend on the path through which the system reaches its current state. The behavior of a Markov chain can be summarized in the form of a matrix of transition probabilities where each of its elements represents the probability of transition from one mode in the past to another mode later. Transition probability matrix is a k×k matrix, where k is the number of members of a state space. Eq. 20 expresses a case of a transfer matrix and Eq. 21 expresses the first level of a three-state Markov Chain transition probability matrix used in this study (Shamshad et al., 2005).
$$$P={\left({P}_{i,j}\right)}_{\left(n\times n\right)}=\left[\begin{array}{ccc}{P}_{\mathrm{1,1}}& \cdots & {P}_{1,n}\\ \vdots & \ddots & \vdots \\ {P}_{n,1}& \cdots & {P}_{n,n}\end{array}\right]$$$ | (20) |
$$$P={\left(P\right)}_{\left(3\times 3\right)}=\left[\begin{array}{ccc}{P}_{gg}& {P}_{gs}& {P}_{gu}\\ {P}_{sg}& {P}_{ss}& {P}_{su}\\ {P}_{ug}& {P}_{us}& {P}_{uu}\end{array}\right]$$$ | (21) |
Where subtitles g, s and u as mentioned previously represent time with good, sensitive and unhealthy air quality, respectively, as the first subtitle(subscript) is related to time t-1 and the second subtitle (subscript) is related to time t (e.g. P_{uu} is the probability of occurrence of two consecutive unhealthy hours). Each element of this matrix is determined based on Eq. 22.
$$${P}_{ij}=\frac{{n}_{ij}}{\mathrm{\Sigma}{n}_{ij}}$$$ | (22) |
Where n denotes the hours, for example P_{gu} is the probability of occurrence of one hour unhealthy air quality after one hour healthy air quality.
By determining the transition probability matrix of the Markov chain, several analyses can be carried out with the most important of which is the continuing pollution of unhealthy PM_{2.5}. Considering that after the occurrence of a continuation of n hours of good air quality certainly an hour of sensitive quality or unhealthy certainly will occur. Eq. 23 is extracted to calculate the probability of n hours of continuing good air quality, and similarly Eq. 24 and 25 are extracted to calculate the probability of sustainability of n hours of unhealthy and sensitive air quality.
$$${P}_{g}\left(n\right)={{P}_{gg}}^{n-1}\times {P}_{gs}+{{P}_{gg}}^{n-1}\times {P}_{gu}$$$ | (23) |
$$${P}_{s}\left(n\right)={{P}_{ss}}^{n-1}\times {P}_{sg}+{{P}_{ss}}^{n-1}\times {P}_{su}$$$ | (24) |
$$${P}_{u}\left(n\right)={{P}_{uu}}^{n-1}\times {P}_{ug}+{{P}_{uu}}^{n-1}\times {P}_{us}$$$ | (25) |
3. Results and discussion
Table 3 indicates the correlation between parameters in our study. There should not be a high correlation between parameters in a stimulating process (Kuncheva, 2004), because then there is no need to utilize complex models such as neural network and the problem can be easily solved by regression methods. We used 1,300 data. Correlation was based on Pearson’s. As indicated in Table 3, PM_{2.5} has significant correlations with PM_{10}, NO_{2}, NO_{x}, SO_{2}, CO at 1% significance level. However, according to the results of Table 3, very weak correlation exists between PM_{2.5} and each of other parameters including PM_{10} (0.137), NO (0.072), NO_{2} (0.128), NO_{x} (0.101), SO_{2} (0.116) and CO (0.289) parameters. Because of very weak correlation exists between PM_{2.5} and each of the input parameters. Therefore, we used the parameters as inputs to the neural network.
3. 1 Determining the Optimal Parameters in Predicting the Amount of PM_{2.5}
One of the major issues affecting the performance of a neural network is to select the input parameters to train the network. For this purpose, various algorithms have been used in previous studies (Niska et al., 2006; Eleuteri et al., 2005; Kohavi and John, 1997). The use of these methods has limitations and errors; therefore, in this study the decision on the choice of these parameters was based on their performance in the network training. Therefore, various models were applied and their performance were compared (Table 4). In all of these models, we applied two hidden layers with 15 neurons in the each layer (15 neurons in the first layer and 15 neurons in second layer) and type of transfer functions in the layers (First layer: tan-sigmoid transfer function; and second layer: log-sigmoid transfer function) were considered, so that the impact of changes in input parameters on the network was tangible. For example, our ANN6 model was made by using NO, PM_{10}, SO_{2} and temperature as inputs with two hidden layers, 15 neurons in the first layer and 15 neurons in the second layer and transfer functions Tansig and Logsig.
3. 2 MLP Neural Networks
We trained the network using 1,300 data, to predict hourly concentrations of PM_{2.5} in Karaj City. To determine the most suitable network in predicting output, we changed the characteristics of the network such as, inputs, number of neurons in the hidden layers, type of transfer functions, learning rate and momentum factors. To choose the most accurate and reliable model, the amount of errors, R^{2}, IA and E were computed. As a result, it was revealed that the network in which input parameters were CO, NO_{x}, PM_{10} and temperature, (ANN7), had a better performance than the other scenarios (Table 5).
After selecting the optimal input parameters and appropriate number of hidden layers (ANN7), the effects of the other main factors in ANN performance (normalizing data) were evaluated. Networks using input data CO, NO_{x}, PM_{10} and temperature were trained once by normalizing data and once without it. The number of neurons in the hidden layers in the MLP networks was changed using trial and error and were automatically changed in RBF neural networks from 0 to 1,488 (number of data). Table 6 indicates the characteristic of ANN7 applied in our study.
Fig. 4 indicates the performance of various networks. Normalization of the data improved network performance. Increasing the number of neurons in the hidden layer in the MLP and the RBF networks reduced forecast error and increased the coefficient of determination. A network made with two hidden layers including 19 neurons in the first layer and 16 neurons in the second layer, using the normalized data (FN19/16) had a coefficient of determination, efficiency (E), RMSE and MBE equal to 0.92, 0.981, 1.25 and 0.0545, respectively. The network had the best performance among the MLP networks which we developed (Fig. 4).
Fig. 5 illustrated the comparison of the observed data with the predicted data using MLP network with 188 data test for PM_{2.5} parameter. The coefficient of determination was 0.92 and the RMSE was 1.25. The result indicates the good workability of the model.
3. 3 The Effect of Learning Rate and Momentum Factor
Fig. 6 indicates the effect of the learning rate and momentum factor in the performance of MLP networks. First, we fixed the rate of learning (on each of the values: α=0.05, 0.2, 0.4, 0.65, 0.8, 0.95). We increased the momentum coefficient (β) from 0.05 to 0.95 in steps of 0.05 and trained and tested each step 20 times. Each of the network errors in predicting PM_{2.5} values were averaged and considered as the error value for a model with factor of momentum and learning rate. These tests were made on the MLP model with two hidden layers (FN19/16). The results indicated that increasing the learning rate caused weakens of the network performance. On the other hand, lower values in the learning rate made the learning process time-consuming. Increasing momentum from 0.05 to 0.95 first increased errors, then improved performance and reduced errors. Therefore, we concluded that determining changes of weight by using any of the slopes of efficiency function or previous weight changes alone improves network performance compared to when the both of these factors are involved in determining the new weight. To select the momentum factor in the range of 0 to 1 improved network performance in predicting PM_{2.5} in our research.
Correct designing of the parameters of an MLP neural network such as input parameter, the number of layers, the number of hidden layer neurons, transfer functions, learning rate and momentum factors and normalizing data also increases the accuracy. We considered all mentioned parameters and determined the effect of learning rate and momentum factor in the amount of RMSE in the MLP neural network (Fig. 6) which causes a good performance of predicting PM_{2.5} in the Karaj City. Voukantsis et al. (2011) used a principal component analysis to select input parameters for MLP neural network and predicted PM_{10} and PM_{2.5}. They obtained IA=0.8. We carried out other method for selecting input parameters and selected those input parameters which each of input parameter had a low correlation coefficient with PM_{2.5} parameter. Index of agreement (IA) in our study for ANN7 was 0.84. The advantage of their work was to select a few meteorological parameters as input to the ANN. Our input parameters for MLP and RBF were NO, NO_{2}, NO_{x}, PM_{10}, SO_{2} and temperature. However, meteorological data was not available for authors. Bahari et al. (2014) predicted PM_{2.5} concentrations, in one station, in Tehran using an MLP neural network. Their input parameters were temperature, wind speed, wind direction, relative humidity, and cloud cover and inversion strength. They did not describe the method of selecting input parameters to the ANNs. The R^{2} of the study were between 0.61 and 0.79. However, the R^{2} in our study for MLP and RBF was 0.92 and 0.93, respectively.
We compared the results our MLP neural network with the results of Voukantsis et al. (2011) and Feng et al. (2015) and found that our model presented a suitable performance in predicting PM_{2.5} concentrations in Karaj City. Feng et al. (2015) obtained RMSE rate between 28 to 36 for one day and two days PM_{2.5} prediction using an MLP neural network. The amount of RSME in the MLP neural network was 1.25 for FN19/16 model. Their RSME results could be due to the smaller number of data used in their studies and may need a longer data collection. In fact, they changed in concentrations of suspended particles over the year lead to a reduction in network accuracy in predicting the amount of this parameter.
3. 4 RBF Neural Network
In RBF networks that are formed from a hidden layer and Radial Basis transfer function, the number of neurons starts from zero and increases. At each stage, error calculated and reported. This process continues until the error decreases to zero, or the number of neurons is equal to the number of input data. Fig. 7 indicates the changes of prediction error for PM_{2.5} due to the increase of the number of neurons of hidden layer using this method. The root mean square error of the neurons in this method decreased with increasing neurons from 7.88 to 2e-06 and the coefficient of determination reached 0.93. The results indicate the proper functioning of this network in predicting the concentration of PM_{2.5}, without requiring any design. The coefficient of determination between observed data and predicted data reached 0.93 which indicates the reliability of RBF in predicting of PM_{2.5} (Fig. 8).
The training of neural network structure of artificial neural network which was used in our study was, according to design of continuous statistical model by using past data. This artificial neural network presents a numerical description of a mathematical structure which is able to predict the physical condition of air pollution for 24 or 48 hours in advance. By increasing the length of existing data, we can increase the possibility of predictions durations.
3. 5 Markov Chain
Equation 26 presents the transition probability matrix and Fig. 9 indicates its graphical representation for 9 different transition mode. The results of this matrix state that, in the event of good or sensitive air quality, there is the possibility of repetition of the above condition (probability 71% and 89% respectively). However, in the event of pollution and poor quality in one of night and day hours, the possibility of repeating or transferring to sensitive state is somehow the same value (respectively 48 and 51 percent probability) and there is no high chance for the air to be healthy in the following hours. Generally, in the case of healthy, fresh air, the likelihood of its re-occurrence, unhealthy state, the possibility of switching to a sensitive air quality, and in the case of sensitive weather, the likelihood of its continuation get the highest value.
$$${P}_{\left(3\times 3\right)}=\begin{array}{c}g\\ s\\ u\end{array}\left[\begin{array}{ccc}g& s& u\\ 0.7091& 0.28& 0.0109\\ \begin{array}{c}0.0796\\ 0.0141\end{array}& \begin{array}{c}0.886\\ 0.507\end{array}& \begin{array}{c}0.0346\\ 0.4788\end{array}\end{array}\right]$$$ | (26) |
Table 7 presents the number of incidents and the probability of selection for each of the two data sets initial calculations (1,300 data) and test (data 188) with the error between them. By calculating the transition probability matrix for 188 test data, showed that the matrix with slight error is similar to the calculated matrix for the raw data. For example, the possibility of transfer of clean air quality to sensitive state (gs) was calculated as 0.28 that compared with the probability of the situation in test data that was equal to 0.2758, with absolute error of 0.014 has acceptable accuracy and precision. As a result, the probability of occurrence of PM_{2.5} pollution in different periods is predictable using this method.
Fig. 9(a-c) indicates the continuation of n hour for 2 hours to 1 day for each of air quality conditions. The vertical axis is the likelihood to maintain a state and the horizontal axis presents duration time. For example, in the first curve (Fig. 9(a)), P(n) is the probability that a good air quality status is maintained for n hours and then the situation will change. Fig. 9(c) illustrates the high probability of remaining two-hour unhealthy state (25%) the poor status of PM_{2.5} pollutant. However, the likelihood of continuation of this situation to fourth and fifth hours is reduced and there will be no substantial risk of the continuity of too much pollution. Also, due to the low likelihood of continuation of healthy condition and opposite to that its relative high likelihood in some hours or daily continuation of sensitive quality, we will not experience favorable weather conditions in terms of particulate pollution, especially for sensitive groups.
The MLP, RBF and Markov chain are independent model. The MLP and RBF belong to a family of artificial intelligence neural network, which they can predict future physical quantities of PM_{2.5} in a city. The RBF results may be accurate than MLP model. The Markov chain model can describe the probability of occurrence of PM_{2.5} in different periods and also indicates the quality of air in a city in three forms, including good quality, sensitive and unhealthy conditions.
4. Conclusion
Considering the results and discussion of using the MLP neural network, RBF neural network and Marko chain to predict PM_{2.5} pollutant in the Karaj city, Iran, we summarized the results as follows:
- 1. The MLP neural network needs a suitable design for optimal performance. We developed an MLP neural network containing two hidden layers with 19 neurons in the first layer and 16 neurons in the second layer. The MBE was 0.0545 which indicates the adequacy of the MLP neural network. The R^{2} and Index of agreement (IA) between the observed data and the predicted data were 0.92 and 0.93, respectively.
- 2. Change of momentum and learning coefficients indicated that increasing the learning rate increases MLP network error, and thus choosing lower the learning rate improves network performance. On the other hand, low learning rate model reduces modeling speed. The increase momentum rate from 0 to 1 increases the error and then reduces it. This issue proves that the selection of new weight based on the performance function slope or previous weight will result in better network performance.
- 3. Selecting appropriate learning rate and momentum factors caused improving performance of artificial neural network.
- 4. The RBF neural network using a hidden layer with transfer radial basis had an easy and good performance to predict hourly PM_{2.5} pollutant. By increasing the number of neurons from zero to 1,488 (equal to the number of data) the errors of this network dropped from 7.88 to 2E-06 and coefficient of determination between observed data and predicted data reached 0.92.
- 5. The RBF prediction of hourly PM_{2.5} reaches to more accurate results if we compare to the MLP neural network.
- 6. Markov chain model results indicated that the air quality in the coming months of 2015, will continue in a sensitive state, which is dangerous for people with heart disease and respiratory problems.
REFERENCES
- Bahari, R.A., Ali Abssaspour, R., Pahlavi, P., (2014), Prediction of PM_{2.5} concentrations using temperature inversion effects based on an artificial neural network, The ISPRS international conference of Geospatial information research, 15-17 November, Tehran, Iran. [https://doi.org/10.5194/isprsarchives-xl-2-w3-73-2014]
- Caputo, M., Gimenez, M., Schlamp, M., (2003), Intercomparison of atmospheric dispersion models, Atmospheric Environment, 37, p2435-2449. [https://doi.org/10.1016/s1352-2310(03)00201-2]
- Chung, K.L., Farid, AitSahlia, (2003), Elementary Probability Theory: With Stochastic Processes and an Introduction to Mathematical Finance, Springer Undergraduate Texts in Mathematics and Technology, ISSN 0172-6056.
- Cohen, S., Intrator, N., (2002), Automatic model selection in a hybrid perceptron/radial network; Information Fusion, Special Issue on Multiple Experts, 3(4), p259-266.
- Deng, X., Zhang, F., Rui, W., long, F., Wang, L., Feng, Z., Chen, D., Ding, W., (2013), PM_{2.5}-induced oxidative stress triggers autophagy in human lung epithelial A549 cells, Toxicology in Vitro, 27(6), p1762-1770. [https://doi.org/10.1016/j.tiv.2013.05.004]
- Dong, G.H., Zhang, P., Sun, B., Zhang, L., Chen, X., Ma, N., (2012), Long term exposure to ambient air pollution and respiratory disease mortality in Shenyang, China: a 12 year population - based retrospective cohort study, Respiration, 84(5), p360-368. [https://doi.org/10.1159/000332930]
- Eleuteri, A., Tagliaferri, R., Milano, L., (2005), A novel information geometric approach to variable selection in MLP networks, Neural Network, 18(10), p1309-1318. [https://doi.org/10.1016/j.neunet.2005.01.008]
- Feng, X., Li, Q., Zhu, Y., Hou, J., Jin, L., Wang, J., (2015), Artificial neural network forecasting of PM_{2.5} pollution using air mass trajectory based geographic model and wavelet transformation, Atmospheric Environment, 107, p118-128. [https://doi.org/10.1016/j.atmosenv.2015.02.030]
- Goss, C.H., Newsom, S.A., Schildcrout, J.S., Sheppard, L., Kaufman, J.D., (2004), Effect of ambient air pollution on pulmonary exacerbations and lung function in cystic fibrosis, American Journal of Respiratory and Critical Care Medicine, 169(7), p816-821. [https://doi.org/10.1164/rccm.200306-779oc]
- Hambli, R., (2011), Multiscale prediction of crack density and crack length accumulation in trabecular bone based on neural networks and finite element simulation, International Journal for Numerical Methods in Biomedical Engineering, 27(4), p461-475. [https://doi.org/10.1002/cnm.1413]
- Hanna, S.R., Paine, R., Heinold, D., Kintigh, E., Baker, D., (2007), Uncertainties in air toxics calculated by the dispersion models AERMOD and ISCST 3 in the Houston ship channel area, Journal of Applied Meteorology and Climatology, 46, p1372-1382. [https://doi.org/10.1175/jam2540.1]
- Harsham, D.K., Bennett, M., (2008), A sensitivity study of validation of three regulatory dispersion models, American Journal of Environmental Sciences, 4(1), p63-76.
- Haykin, S., (1999), Neural networks: a comprehensive foundation, (2nd ed.), Upper Saddle River, New Jersey: Prentice Hal.
- Jones, R.M., Nicas, M., (2014), Benchmarking of a Markov multizone model of contaminant transport, Annals of Occupational Hygiene, 58(8), p1018-1031.
- Kohavi, R., John, G.H., (1997), Wrappers for feature subset selection, Artificial Intelligence, 97, p273-324. [https://doi.org/10.1016/s0004-3702(97)00043-x]
- Kohohen, T., (1984), Self-organization and associative memory, New York, Springer-Verlag.
- Krause, P., Boyle, D.P., Bäse, F., (2005), Comparison of different efficiency criteria for hydrological model assessment, Advances in Geosciences, 5, p89-97. [https://doi.org/10.5194/adgeo-5-89-2005]
- Kukkonen, J., Partanen, L., Karppinen, A., Ruuskanen, J., Junninen, H., Kolehmainen, M., Li, P., Xin, J.Y., Wang, Y.S., Wang, S.G., Li, G.X., Pan, X.C., Liu, Z.R., Wang, L.L., (2015), Reinstate regional transport of PM_{2.5} as a major cause of severe haze in Beijing, Proceeding of the National Academy of Sciences of the United States of America, 112, pE2739-E2740.
- Kuncheva, L., (2004), Combining Pattern Classifiers: Methods and Algorithms, Wiley, New York, USA.
- Kurt, A., Gulbagci, B., Karaca, F., Alagha, O., (2008), An online air pollution forecasting system using neural networks, Environment International, 34, p592-598. [https://doi.org/10.1016/j.envint.2007.12.020]
- Logofet, D.O., Lensnaya, E.V., (2000), The mathematics of Markov models: what Markov chains can really predict in forest successions, Ecological Modelling, 2(3), p285-298.
- Nicas, M., (2014), Markov modeling of contaminant concentrations in indoor air, American Journal of Environmental Sciences, 61(4), p484-491. [https://doi.org/10.1080/15298660008984559]
- Niska, H., Dorling, S., Chatterton, T., Foxall, R., Cawley, G., (2003), Extensive evaluation of neural network models for the prediction of NO_{2} and PM_{10} concentrations, compared with a deterministic modeling system and measurements in central Helsinki, Atmospheric Environment, 37, p4539-4550.
- Niska, H., Heikkinen, M., Kolehmainen, M., (2006), Genetic algorithms and sensitivity analysis applied to select inputs of a multi-layer perceptron for the prediction of air pollutant time-series, Chapter Intelligent data engineering and automated learning-IDEAL2006 volume 4224 of the series lecture notes in computer science p224-231, springer publisher. [https://doi.org/10.1007/11875581_27]
- Niska, H., Rantamäki, M., Hiltunen, T., Karppinen, A., Kukkonen, J., Ruuskanen, J., (2005), Evaluation of an integrated modelling system containing a multi-layer perceptron model and the numerical weather prediction model HIRLAM for the forecasting of urban airborne pollutant concentrations, Atmospheric Environment, 39(35), p6524-6536. [https://doi.org/10.1016/j.atmosenv.2005.07.035]
- Orr, M.J.L., (1996), Introduction to radial basis function networks, University of Edinbergh, EH89LW.
- Owega, S., Khan, B.U.Z., Evans, G.J., Jervis, R.E., Fila, M., (2006), Identification of long-range aerosol transport patterns to Toronto via classification of back trajectories by cluster analysis and neural network techniques, Chemo Metrics and Intelligent Laboratory Systems, 83(1), p26-33. [https://doi.org/10.1016/j.chemolab.2005.12.009]
- Romanof, N., (1982), A Markov chain model for the mean daily SO_{2} concentrations, Atmospheric Environment, 16(8), p1895-1897. [https://doi.org/10.1016/0004-6981(82)90377-8]
- Rumelhart, D.E., McClelland, J.L., (1986), Parallel distribution processing: Exploration in the microstructure of cognition, Cambridge, MA, MIT Press.
- Shamshad, A., Bawadi, M.A., Wan Hussin, W.M.A., Majid, T.A., Sanusi, S.A.M., (2005), First and second order Markov chain models for synthetic generation of wind speed time series, Energy, 30, p693-708. [https://doi.org/10.1016/j.energy.2004.05.026]
- Slaughter, J.C., Lumley, T., Sheppard, L., Koenig, J.Q., Shapiro, G.G., (2003), Effects of ambient air pollution on symptom severity and medication use in children with asthma, Annals of Allergy, Asthma and Immunology, 91(4), p346-353. [https://doi.org/10.1016/s1081-1206(10)61681-x]
- Slini, T., Kaprara, A., Karatzas, K., Moussiopoulos, N., (2006), PM_{10} forecasting for Thessaloniki, Greece, Environ. Modell. Softw, 21, p559-565. [https://doi.org/10.1016/j.envsoft.2004.06.011]
- Song, X.M., (1996), Radial basis function networks for empirical modeling of chemical process, MSc thesis, University of Helsinki.
- Sun, W., Zhang, H., Palazoglu, A., Singh, A., Zhang, W., Liu, S., (2013), Prediction of 24-hour-average PM_{2.5} concentrations using a hidden Markov model with different emission distributions in Northern California, Science of the Total Environment, 443, p93-103. [https://doi.org/10.1016/j.scitotenv.2012.10.070]
- Taylor, H., Karlin, S., (1998), An Introduction to Stochastic Modeling, Academic Press, San Diego, California.
- Voukantsis, D., Karatzas, K., Kukkonen, J., Räsänen, T., Karppinen, A., Kolehmainen, M., (2011), Intercomparison of air quality data using principal component analysis, and forecasting of PM_{10} and PM_{2.5} concentrations using artificial neural networks, in Thessaloniki and Helsinki, Science of the Total Environment, 409, p1266-1276. [https://doi.org/10.1016/j.scitotenv.2010.12.039]
- Wang, X., Liu, W., (2012), Research on Air Traffic Control Automatic System Software Reliability Based on Markov Chain, Physics Procedia, 24, p1601-1606. [https://doi.org/10.1016/j.phpro.2012.02.236]
- Wilks, D.S., (2006), Statistical methods in the atmospheric sciences, 2nd ed., Academic Press, xvii, p627.
- Zickus, M., Greig, A.J., Niranjan, M., (2002), Comparison of four machine learning methods for predicting PM_{10} concentration in Helsinki, Finland, Water, Air and Soil Pollution, 2(5), p717-729.
- Zurada, J.M., (1992), Introduction to Artificial Neural Systems, PWS, Singapore, p195-196.