Missing Value Imputation for PM10 Concentration in Sabah using Nearest Neighbour Method (NNM) and Expectation-Maximization (EM) Algorithm
Copyright © 2020 by Asian Journal of Atmospheric Environment
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Missing data in large data analysis has affected further analysis conducted on dataset. To fill in missing data, Nearest Neighbour Method (NNM) and Expectation Maximization (EM) algorithm are the two most widely used methods. Thus, this research aims to compare both methods by imputing missing data of air quality in five monitoring stations (CA0030, CA0039, CA0042, CA0049, CA0050) in Sabah, Malaysia. PM10 (particulate matter with aerodynamic size below 10 microns) dataset in the range from 2003-2007 (Part A) and 2008-2012 (Part B) are used in this research. To make performance evaluation possible, missing data is introduced in the datasets at 5 different levels (5%, 10%, 15%, 25% and 40%). The missing data is imputed by using both NNM and EM algorithm. The performance of both data imputation methods is evaluated using performance indicators (RMSE, MAE, IOA, COD) and regression analysis. Based on performance indicators and regression analysis, NNM performs better compared to EM in imputing data for stations CA0039, CA0042 and CA0049. This may be due to air quality data missing at random (MAR). However, this is not the case for CA0050 and part B of CA0030. This may be due to fluctuation that could not be detected by NNM. Accuracy evaluation using Mean Absolute Percentage Error (MAPE) shows that NNM is more accurate imputation method for most of the cases.
Keywords:
Particulate matter, Missing data, Nearest neighbour method, Expectation maximization algorithm, Performance indicators1. INTRODUCTION
Air quality monitoring in Malaysia is continuously conducted by Department of Environment (DOE) and is done in stations around Malaysia (Dominick et al., 2012). These stations collect PM10 concentration data at one-hour interval. However, due to maintenance, calibration of monitoring instruments and power outage, data collected by monitoring stations may suffer missingness. Missing data mechanism can be categorized into three different types: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR) (Nakai and Ke, 2011). Missingness is categorized as MNAR when it depends on the missing value itself. MNAR is known to be non-ignorable and missing data due to MNAR is not possible to be recovered (Graham, 2009). On the other hand, missingness due to MAR depends on the observed data. MAR is ignorable and missing data can be recovered because its missingness does not depend on missing data itself. MCAR is a special case of MAR, where missingness is independent of both missing data and observed data (Dong and Peng, 2013). A set of data containing missing data due to MCAR can be considered as complete dataset because the missingness does not introduce bias (Dong and Peng, 2013). Little’s MCAR test can be used to determine whether the missingness is due to MCAR (Li, 2013). If the missingness is not MCAR instead, this test cannot be used to determine whether the missingness is due to MAR or MNAR (Dong and Peng, 2013). In terms of air quality data in Malaysia, missingness can be considered as MAR because the missingness is mainly caused by maintenance, calibration of monitoring instruments and power outage. It does not depend on whether the value of data is lower or higher than certain value. Missingness can affect further analysis that requires complete dataset such as Fourier analysis and principal component analysis.
Particulate matter (PM) is mixture of substances in the form of small particles suspended in the air. PM is one of the critical components of air pollution (Li et al., 2017b). Due to its small size, PM can enter respiratory system, thus becoming one of major concerns in public health (Chang et al., 2018). Because of this, scientific attraction has been attracted towards PM (Shahraiyni and Sodoudi, 2016). PM mainly comes from motor vehicles, dust from construction sites and landfills. It also comes from biomass burning and brought by haze, a typical challenge in Southeast Asia since 1980s (Shaadan et al., 2015). PM10 (particulate matter with aerodynamic diameter less than 10 microns) is one of major concern because it possesses hazardous properties towards human health compared to other pollutants such as carbon monoxide and nitrogen dioxide (Kim et al., 2015; Ny and Lee, 2010). This is because it can enter respiratory system while defending natural defences of human body (Chang et al., 2018). PM10 can increase risk of asthma, aggravate bronchitis, respiratory syncytial virus (RSV) bronchiolitis and other lung diseases (Carugno et al., 2018; Lelieveld et al., 2015). This is especially true for children aged between 5-15 years (Cadelis et al., 2014). Other than respiratory problems, cardiovascular disease and cancer can be developed due to PM10 in the air (Li et al., 2017a).
Many agencies around the world such as European Union (EU) and World Health Organization (WHO) implemented guidelines and set limit on air pollution concentration levels (Abd. Rani et al., 2018). In Malaysia, the guidelines are implemented by DOE. According to New Malaysia Ambient Air Quality Standard, PM10 concentration has its standard set to 50 μg/m3 (1-year averaging time) on 2015 before it is gradually lowered to 40 μg/m3 by 2020 (Department of Environment, n. d.). The implementation of this standard is important in order to ensure that air quality can be maintained at safe level. Therefore, there is a need to continuously monitor ambient air quality around Malaysia.
This research focuses on evaluating performance of data imputation on air quality data from five monitoring stations around Sabah. To make performance evaluation possible, missingness is introduced to compare observed data with imputed data. Two methods of data imputation are studied in this research, namely Nearest Neighbour Method (NNM) and Expectation-Maximization (EM) algorithm. Many previous studies have employed nearest neighbour method and expectation-maximization algorithm to obtain complete dataset. However, not many of these studies emphasize on the efficiency of these two methods in data imputation. By comparing between both NNM and EM algorithm, further analysis that requires complete dataset can be made more accurate.
2. DATA AND METHODS
2. 1 Study Area and Data
Five monitoring stations (CA0030, CA0039, CA0042, CA0049, CA0050) in Sabah are listed in Table 1. Respective cities of each monitoring station are located as shown in Fig. 1. Except for CA0049, other monitoring stations are located at low altitudes and are close to the sea. Furthermore, Labuan (CA0050) is situated on a small island located at western of Sabah. As shown in Fig. 2, PM10 concentration in Sabah differs between seasons and location (Kanniah et al., 2016). Western coast of Sabah generally has higher PM10 concentration compared to other parts of Sabah all-year round. Also, PM10 concentration in Sabah is generally lower during intermonsoon October.
These monitoring stations, operated by DOE, continuously measures PM10 concentration data at 1-hour interval. PM10 concentration is measured using tapered element oscillating microbalance (TEOM), with temporal resolution of 1 h. As wind direction is angular quantity, wind speed and direction must be converted into x-component (east-west) and y-component (north-south) wind speed using equations (1) and (2). This prevents difficulty in analysis due to nature of angular quantity (Muhammad Izzuddin et al., 2019; Kovač-Andrić et al., 2009).
(1) |
(2) |
For the purpose of this research, 10-year hourly data from 2003 to 2012 are divided into two parts. The first part (Part A) ranges from 2003 to 2007, while the second part (Part B) ranges from 2008 to 2012. Due to climate change, trends of PM10 concentration data may differ from both parts. Thus, both parts may have difference in these data.
2. 2 Introduce Missingness to Data
In order to ensure that imputed data can be validated, a fraction of observed data must be replaced by missingness. Depending on complexity, missingness is introduced into data by percentage as conducted in previous research by Noor et al. (2014) as shown in Table 2. A sequence of zeros and ones (0 - do not replace observed data, 1 - replace observed data with missingness) is randomly generated using MATLAB 2018b and is used as a reference to introduce missingness to observed data. The actual percentage after introducing missingness may deviate by up to 2% due to existing missingness in the data.
2. 3 Data Imputation
A lot of data imputation method has been proposed for temporal dataset (Bai et al., 2019). Due to simplicity, two of the most popular methods used in data imputation are NNM and EM. NNM is common in replacing missing air quality data (Li and Liu, 2014; Dominick et al., 2012). For a stream of missing data bounded by observed data (x1, y1) in lower bound and (x2, y2) in upper bound, missing data is replaced with a value calculated using equations (3) and (4) (Abd Rani et al., 2018; Zakaria and Noor, 2018; Siti Zawiyah et al., 2010; Junninen et al., 2004). NNM is performed by executing a code developed using MATLAB 2018b.
(3) |
(4) |
EM algorithm employs a set of iterative equations to estimate mean vector and covariance matrix of multivariate distribution from exponential family (Junger and de Leon, 2015). This method maximizes log likelihood to find parameters when there are missing values (Nakai and Ke, 2011). The simplicity and smooth operation of EM algorithm makes it unique among present multiple imputation methods. In addition, its faster operation compared to the alternatives makes EM algorithm one of the most popular imputation methods (Abd Rani et al., 2018).
Given a set of data consisting of observed data Dobs and missing data Dmis, EM algorithm starts by defining parameter θ as a random value. Then, E-step (expectation step) calculates the likelihood of each values of Dmis for every missingness. M-step (maximization step) uses computed values of Dmis to find better estimation of θ. Given the likelihood function L and expected value of log likelihood function Q (θ|θ(t)), both E-step and M-step iterate until the value converges (Abd Rani et al., 2018). Both E-step and M-step are executed using equations (5) and (6).
(5) |
(6) |
2. 4 Performance Evaluation
The performance of data imputation is evaluated by using performance indicators. The performance indicators that have been used are root mean square error (RMSE), mean absolute error (MAE), index of agreement (IOA), and coefficient of determination (COD). The performance indicators are calculated by using equations (7) to (10) (Abd. Rani et al., 2018; Nuryazmin et al., 2015; Ul-Saufie et al., 2013; Junninen et al., 2004):
(7) |
(8) |
(9) |
(10) |
where n is total number of data, Pi is predicted value of ith data, Oi is observed value of ith data, is mean predicted value, is mean observed value, sp is standard deviation of predicted values, and so is standard deviation of observed values.
2. 5 Mean Absolute Percentage Error (MAPE)
Mean absolute percentage error (MAPE) is a measure that evaluates accuracy of a prediction model (Khair et al., 2017). MAPE indicates error in predicting the value of missing data when comparing to real value. MAPE is calculated using equation (11) as follows (Khair et al., 2017).
(11) |
3. RESULT AND DISCUSSION
3. 1 Performance Indicators
PM10 concentration datasets for five monitoring stations in Sabah are analysed. RMSE, MAE, IOA, and COD are calculated for every percentage of missingness and station for both part A and B. Tables 3 and 4 reveals performance indicators for NNM and EM at 5 missingness levels and 5 different stations for part A and part B respectively. The desirable attributes between these methods are highlighted in bold. In terms of missingness level, there is no definite relationship between performance of data imputation and missingness level. This is because both NNM and EM impute missing data based on available data. As long as available data is sufficient, missing data can still be effectively imputed.
Most of the data show that nearest neighbour method is better imputation method. This may be due to the nature of missingness in relation to ability of EM algorithm to impute data. EM algorithm works best for missing data caused by MCAR (Nakai and Ke, 2011; Graham, 2009). However, air quality data collected in monitoring stations are not caused by MCAR as the cause of missingness is known. This may attribute to lower performance of EM algorithm compared to NNM.
However, this is not the case for CA0050, where most of the performance indicators for that station show that EM algorithm is a better imputation method. This may be due to the fact that Labuan is surrounded by sea. One study has shown that air humidity is affected by bodies of water due to high heat capacity and strong evaporation (Zhu and Zeng, 2018). Furthermore, cold-wet air that surrounds a water body enhances air flow away from bodies of water by changing the local air circulation (Zhu and Zeng, 2018). The local air circulation highly affects humidity in Labuan. Another study suggests that different levels of humidity affects PM10 concentration differently (Lou et al., 2017). PM10 concentration increases with humidity up to 60%. Beyond that point, gravity deposition occurs and PM10 concentration begins to drop (Lou et al., 2017). PM10 concentration as monitored by CA0050 may fluctuate due to continually changing of humidity level, traffic congestion and active industrial activity. This fluctuation is not accounted by NNM, leading to indication that EM algorithm is better imputation method for data collected by CA0050.
As for PM10 concentration read by CA0030, several performance indicators show that EM algorithm is better imputation method especially for part B of the data. This may be due to fluctuation of PM10 concentration in Kota Kinabalu especially between year 2008 and 2012. One study shows that PM10 concentration from 16th to 18th January 2012 spiked at 7.00 a.m. and fluctuates at the other time (Chang et al., 2018). When this portion of data is missing, NNM may not be able to restore the missingness as well as EM algorithm.
3. 2 Regression Analysis on Imputed Data
The performance of data imputation is further evaluated by calculating correlation of coefficient R on predicted data against observed data. The most ideal case of imputed data occurs when predicted data equals observed data (R=1). Tables 5 and 6 reveals coefficient of correlation of data in part A and B respectively, for all five missingness percentages and five stations.
Similar to performance indicators, coefficient of correlation shows that NNM is better imputation method for monitoring stations in Tawau, Sandakan and Keningau. As for CA0030, NNM is better imputation method for Part A, but not in Part B. Dataset recorded by CA0050 strongly suggests that EM algorithm is better imputation method.
Fig. 3 reveals scatter plot of data imputation for both CA0042 and CA0050. CA0042 and CA0050 are selected to be presented in the Fig. 3 because CA0042 is located at high altitude while CA0050 is located in a small island. The predicted-observed regression is shown for both stations due to different geographical condition in contrast to the other three stations. Coefficient of correlation for CA0042 shows relatively large difference between two methods compared to other stations. As shown in Fig. 3, all scatter plots for CA0042 shows that line representing NNM is closer to dashed line compared to line that represents EM algorithm. This shows that NNM has greater tendency to predict missing data closer to observed data compared to EM algorithm. This might be caused by missingness mechanism, in which data is Missing at Random. EM algorithm may not be able to impute MAR data as well as MCAR data (Nakai and Ke, 2011; Graham, 2009).
Meanwhile, CA0050 shows that EM algorithm gives better coefficient of correlation in contrast to other stations. Despite that, Fig. 3 reveals that NNM has either greater tendency (Part A) or approximately similar to EM algorithm (Part B) to predict missing data. This is because the lines representing NNM and EM are plotted at best fit. However, the scatter plot shows that imputed data by NNM for CA0050 are more dispersed away from line of best fit compared to that of CA0042, which might contribute to lower R value of NNM compared to EM algorithm. Although best fit line for NNM is closer to dashed line, the dispersion of scatter plot shows that EM algorithm is better imputation method compared to NNM.
3. 3 Mean Absolute Percentage Error (MAPE)
Performance of data imputation is further evaluated using MAPE. Data imputation is most accurate when MAPE approaches zero. Table 7 reveals accuracy of data imputation using NNM and EM for all stations and various level of missingness. According to Table 7, it is shown that NNM is generally more accurate data imputation method compared to EM (except for CA0050 in set B). This is reflected by lower values for NNM for most of the cases. This may be due to its ability to predict missing data closer to actual data compared to EM.
4. CONCLUSION
Generally, it has been shown that NNM is better imputation method for data from all the monitoring stations in Sabah except CA0050. NNM works most efficient for CA0049 in Part A (RMSE<14.302, MAE<10.640, IA>0.819 and COD>0.586) and CA0042 in Part B (RMSE<10.722, MAE<7.526, IA>0.835 and COD>0.632). This may be due to missing data type of MAR. However, strong fluctuation which may be present in data from CA0050 and part B from CA0030 may cause NNM to impute data not as well as EM algorithm. This may be further confirmed by regression analysis for CA0050 (R>0.711 for part B). Evaluation of accuracy using MAPE reveals that NNM is more accurate imputation method for most cases (except for set B in CA0050). This shows that NNM can be used as data imputation for missing data found in dataset observed by stations in Sabah. Accurate data imputation is important for future research because this enables further analysis on air quality data to become more reliable.
Acknowledgments
The authors would like to thank Universiti Malaysia Sabah for supporting this research by providing grant (SBK0324-2018, SGI0054-2018 and GUG0378-2018) and Department of Environment Malaysia for providing meteorological and pollutant data for research purpose.
References
- Abd. Rani, N.L., Azid, A., Khalit, S.I., Juahir, H. (2018) Prediction Model of Missing Data: A Case Study of PM10 across Malaysia Region. Journal of Fundamental and Applied Science, 10(1S), 182-203.
- Bai, K., Li, K., Guo, J., Yang, Y., Chang, N.B. (2019) Filling the gaps of in-situ hourly PM2.5 concentration data with the aid of empirical orthogonal function constrained by diurnal cycles. Atmospheric Measurement Techniques, 1-29. [https://doi.org/10.5194/amt-2019-317]
- Cadelis, G., Tourres, R., Molinie, J. (2014) Short-Term Effects of the Particulate Pollutants Contained in Saharan Dust on the Visits of Children to the Emergency Department due to Asthmatic Conditions in Guadeloupe (French Archipelago of the Caribbean). PLOS ONE, 9(3), 1-11. [https://doi.org/10.1371/journal.pone.0091136]
- Carugno, M., Dentali, F., Mathieu, G., Fontanella, A., Mariani, J., Bordini, L., Milani, G.P., Consonni, D., Bonzini, M., Bollati, V., Pesatori, A.C. (2018) PM10 exposure is associated with increased hospitalizations for respiratory syncytial virus bronchiolitis among infants in Lombardy, Italy. Environmental Research, 166, 452-457. [https://doi.org/10.1016/j.envres.2018.06.016]
- Chang, H.W.J., Chee, F.P., Kong, S.K.S., Sentian, J. (2018) Variability of the PM10 concentration in the urban atmosphere of Sabah and its responses to diurnal and weekly changes of CO, NO2, SO2 and Ozone. Asian Journal of Atmospheric Environment, 12(2), 109-126. [https://doi.org/10.5572/ajae.2018.12.2.109]
- Department of Environment. (n. d.) New Malaysia Ambient Air Quality Standard. Available at http://www.doe.gov.my/portalv1/wp-content/uploads/2013/01/Air-Quality-Standard-BI.pdf, .
- Dominick, D., Juahir, H., Latif, M.T., Zain, S.M., Aris, A.Z. (2012) Spatial assessment of air quality patterns in Malaysia using multivariate analysis. Atmospheric Environment, 60, 172-181. [https://doi.org/10.1016/j.atmosenv.2012.06.021]
- Dong, Y., Peng, C.Y.J. (2013) Principled missing data methods for researchers. SpringerPlus, 2(222), 1-17. [https://doi.org/10.1186/2193-1801-2-222]
- Graham, J.W. (2009) Missing Data Analysis: Making It Work in the Real World. Annual Review of Psychology, 60, 549-576. [https://doi.org/10.1146/annurev.psych.58.110405.085530]
- Junger, W.L., de Leon, A.P. (2015) Imputation of missing data in time series for air pollutants. Atmospheric Environment, 102, 96-103. [https://doi.org/10.1016/j.atmosenv.2014.11.049]
- Junninen, J., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M. (2004) Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38: 2895-2907. [https://doi.org/10.1016/j.atmosenv.2004.02.026]
- Kanniah, K.D., Kaskaoutis, D.G., Lim, H.S., Latif, M.T., Kamarul Zaman, N.A.F., Liew, J. (2016) Overview of atmospheric aerosol studies in Malaysia: Known and unknown. Atmospheric Research, 182, 302-318. [https://doi.org/10.1016/j.atmosres.2016.08.002]
- Khair, U., Fahmi, H., Al Hakim, S., Rahim, R. (2017) Forecasting Error Calculation with Mean Absolute Deviation and Mean Absolute Percentage Error. Journal of Physics, 930(1), 1-6. [https://doi.org/10.1088/1742-6596/930/1/012002]
- Kim, K.H., Kabir, E., Kabir, S. (2015) A review on the human health impact of airborne particulate matter. Environment International, 74, 136-143. [https://doi.org/10.1016/j.envint.2014.10.005]
- Kovač-Andrić, E., Brana, J., Gvozdić, V. (2009) Impact of meteorological factors on ozone concentrations modelled by time series analysis and multivariate statistical methods. Ecological Informatics, 4(2), 117-122. [https://doi.org/10.1016/j.ecoinf.2009.01.002]
- Lelieveld, J., Evans, J.S., Fnais, M., Giannadaki, D., Pozzer, A. (2015) The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature, 525(7569), 367-371. [https://doi.org/10.1038/nature15371]
- Li, C. (2013) Little’s test of missing completely at random. The Stata Journal, 13(4), 795-809. [https://doi.org/10.1177/1536867X1301300407]
- Li, L., Liu, D.J. (2014) Study on an Air Quality Evaluation Model for Beijing City Under Haze-Fog Pollution Based on New Ambient Air Quality Standards. International Joutnal of Environment Research and Public Health, 11, 8909-8923. [https://doi.org/10.3390/ijerph110908909]
- Li, L., Wu, A.H., Cheng, I., Chen, J.C., Wu, J. (2017a) Spatiotemporal estimation of historical PM2.5 concentrations using PM10, meteorological variables, and spatial effect. Atmospheric Environment. 166, 182-191. [https://doi.org/10.1016/j.atmosenv.2017.07.023]
- Li, X., Chen, X., Yuan, X., Zeng, G., Leon, T., Liang, J., Chen, G., Yuan, X. (2017b) Characteristics of Particulate Pollution (PM2.5 and PM10) and Their Spacescale-Dependent Relationships with Meteorological Elements in China. Sustainability, 9(12), 2330-2443. [https://doi.org/10.3390/su9122330]
- Lou, C., Liu, H., Li, Y., Peng, Y., Wang, J., Dai, L. (2017) Relationships of relative humidity with PM2.5 and PM10 in the Yangtze River Delta, China. Environmental Monitoring Assessment, 189(11), 1-16. [https://doi.org/10.1007/s10661-017-6281-z]
- Muhammad Izzuddin, R., Chee, F.P., Dayou, J., Chang, H.W.J., Soon, K.K.S., Sentian, J. (2019) Temporal Assessment on Variation of PM10 Concentration in Kota Kinabalu using Principal Component Analysis and Fourier Analysis. Current World Environment, 14(3), 400-410. [https://doi.org/10.12944/CWE.14.3.08]
- Nakai, M., Ke, W. (2011) Review of the Methods for Handling Missing Data in Longitudinal Data Analysis. International Journal of Mathematical Analysis, 5(1), 1-13.
- Noor, H.M., Nasrudin, N., Foo, J. (2014) Determinants of Customer Satisfaction of Service Quality: City bus service in Kota Kinabalu, Malaysia. Procedia - Social and Behavioral Sciences, 153, 595-605. [https://doi.org/10.1016/j.sbspro.2014.10.092]
- Nuryazmin, A.Z., Abdul Aziz, J., Nora, M. (2015) A Comparison of Various Imputation Methods for Missing Values in Air Quality Data. Sains Malaysiana, 44(3), 449-456. [https://doi.org/10.17576/jsm-2015-4403-17]
- Ny, M.T., Lee, B.K. (2010) Size Distribution and Source Identification of Airborne Particulate Matter and Metallic Elements in a Typical Industrial City. Asian Journal of Atmospheric Environment, 4(1), 9-19. [https://doi.org/10.5572/ajae.2010.4.1.009]
- Shaadan, N., Jemain, A.A., Latif, M.T., Mohd. Deni, S. (2015) Anomaly detection and assessment of PM10 functional data at several locations in the Klang Valley, Malaysia. Atmospheric Pollution Research, 6, 365-375. [https://doi.org/10.5094/APR.2015.040]
- Shahraiyni, H.T., Sodoudi, S. (2016) Statistical Modeling Approaches for PM10 Prediction in Urban Areas; A Review of 21st-Century Studies. Atmosphere, 7, 1-24. [https://doi.org/10.3390/atmos7020015]
- Siti Zawiyah, A., Mohd Talib, L., Aida Shafawati, I., Liew, J., Abdul Aziz, J. (2010) Trend and status of air quality at three different monitoring stations in the Klang Valley, Malaysia. Air Quality, Atmosphere and Health, 3, 53-64. [https://doi.org/10.1007/s11869-009-0051-1]
- Ul-Saufie, A.Z., Yahaya, A.S., Ramli, N.A., Rosaida, N., Abdul Hamid, H. (2013) Future daily PM10 concentrations prediction by combining regression models and feedforward backpropagation models with principle component analysis (PCA). Atmospheric Environment, 73, 621-630. [https://doi.org/10.1016/j.atmosenv.2013.05.017]
- Zakaria, N.A., Noor, N.M. (2018) Imputation Methods for Filling Missing, Data in Urban Air Pollution Data for Malaysia. Urbanism, 9(2), 159-166.
- Zhu, C., Zeng, Y. (2018) Effects of urban lake wetlands on the spatial and temporal distribution of air PM10 and PM2.5 in the spring in Wuhan. Urban Forestry and Urban Greening, 31, 142-156. [https://doi.org/10.1016/j.ufug.2018.02.008]