AIR POLLUTION INDEX PREDICTION USING MULTIPLE NEURAL NETWORKS

Air quality monitoring and forecasting tools are necessary for the purpose of taking precautionary measures against air pollution, such as reducing the effect of a predicted air pollution peak on the surrounding population and ecosystem. In this study a single Feed-forward Artificial Neural Network (FANN) is shown to be able to predict the Air Pollution Index (API) with a Mean Squared Error (MSE) and coefficient determination, R, of 0.1856 and 0.7950 respectively. However, due to the non-robust nature of single FANN, a selective combination of Multiple Neural Networks (MNN) is introduced using backward elimination and a forward selection method. The results show that both selective combination methods can improve the robustness and performance of the API prediction with the MSE and R of 0.1614 and 0.8210 respectively. This clearly shows that it is possible to reduce the number of networks combined in MNN for API prediction, without losses of any information in terms of the performance of the final API prediction model. ABSTRAK: Pemantauan dan ramalan kualiti udara adalah perlu bagi mengambil langkah berjaga-jaga terhadap pencemaran udara, seperti untuk meramalkan mengurangkan kesan puncak pencemaran udara terhadap penduduk sekitar dan ekosistem. Dalam kajian ini rangkaian tiruan tunggal neural suap depan (FANN) ditunjukkan masing-masing dapat meramalkan indek pencemaran udara (IPU) dengan purata ralat kuasa dua (MSE) dan pekali penentuan, R, daripada 0.1856 dan 0.7950. Namun disebabkan oleh sifat tidak mantap FANN tunggal, gabungan terpilih pelbagai rangkaian neural (MNN) diperkenalkan dengan menggunakan penghapusan ke belakang dan kaedah pemilihan ke hadapan. Keputusan kajian menunjukkan bahawa kedua-dua kaedah gabungan terpilih boleh meningkatkan keteguhan dan prestasi ramalan API masing-masing dengan MSE dan R daripada 0.1614 dan 0.8210. Ini jelas menunjukkan bahawa ia adalah mungkin untuk mengurangkan bilangan rangkaian digabungkan dalam MNN untuk ramalan API, tanpa menjejaskan keupayaan mana-mana maklumat dari segi prestasi model ramalan akhir API.

ABSTRACT: Air quality monitoring and forecasting tools are necessary for the purpose of taking precautionary measures against air pollution, such as reducing the effect of a predicted air pollution peak on the surrounding population and ecosystem.In this study a single Feed-forward Artificial Neural Network (FANN) is shown to be able to predict the Air Pollution Index (API) with a Mean Squared Error (MSE) and coefficient determination, R 2 , of 0.1856 and 0.7950 respectively.However, due to the non-robust nature of single FANN, a selective combination of Multiple Neural Networks (MNN) is introduced using backward elimination and a forward selection method.The results show that both selective combination methods can improve the robustness and performance of the API prediction with the MSE and R 2 of 0.1614 and 0.8210 respectively.This clearly shows that it is possible to reduce the number of networks combined in MNN for API prediction, without losses of any information in terms of the performance of the final API prediction model.

KEYWORDS: air pollution index;artificial neural networks; multiple neural networks;
forward selection; backward elimination

INTRODUCTION
Air quality is monitored continuously and manually to detect any changes in the ambient air quality status that may cause harm to human health or the environment.The Malaysian Department of Environment (DOE) monitors the ambient air quality via a network of 51 monitoring stations across Malaysia [1].These monitoring stations are strategically located in residential, traffic, and industrial areas to detect any significant changes in the air quality which could be harmful to human health and the environment.The ambient air quality measurement in Malaysia is described in terms of the Air Pollutant Index (API), which is a simple way to describe and report the air quality instead of using the actual concentration of air pollutants.This API also reflects effects on human health, ranging from good to hazardous, and can be categorized according to its action criteria as specified in the National Haze Action Plan Malaysia.
Efficient methods for the assessment of air quality are needed in order to establish mechanisms for managing pollutant concentration and preventing illness in healthsensitive people.The criterion for good air quality varies with the kind of ecosystem and is established at different levels.Several methodologies for the assessment and monitoring of air pollutants have been implemented by organizations such as the Department of Environment (DOE) of Malaysia which has developed indexes for air quality.In response to this concern, several studies on air quality prediction using artificial neural networks have been done [2,3].Unlike other modelling techniques, artificial neural networks (ANN) make no prior assumptions concerning the data distribution and require no mechanistic knowledge.ANN is capable of modelling highly nonlinear relationships and can be trained to accurately generalize when presented with a new data set.An air quality prediction model based on neural networks had also been applied on a short-term and long-term basis.Viotti et al [4] has applied this prediction model to predict the vehicular air pollutant levels in the city of Perugia, Italy, while Sabri and Tarek [5] have applied it in the region of Annaba, Algeria.However, the latter have combined a radial basis function (RBF) network and multiple layer perceptron (MLP) in their model to predict the air pollutant concentrations.In addition to the emission sources, meteorological factors (wind speed and direction, temperature, precipitation and boundary layer heights), can govern the variability of atmospheric PM10 [6,7] as well.In fact, urban and industrialized areas tend to record their highest PM10 concentrations under stable meteorological conditions coupled with thermal inversions or during long range transport events [8,9] while the lowest readings tend to occur during windy and rainy periods [10].Many researchers have studied the prediction of particulate matter concentration in the environment.Perez et al. [11] and Yan and Jian [12] have focused their study on the prediction of the PM2.5 (particulate matter with a diameter smaller than 2.5 micrometers) concentration using an ANN model.Some of the researchers have developed an air quality prediction model based on neural networks with a multilayer perceptron structure.Gardner and Dorling [13] and Perez and Trier [14] have adopted this model to predict the NO and NO2 concentration based on meteorological data in Central London and traffic junctions in Santiago City in Chile respectively.They have also concluded that the MLP has better performance compared to their previously developed regression models.Feed-forward artificial neural networks (FANN) have also been applied by Sousa et al. [15] to predict hourly ozone concentration based on meteorological data while Ul-Saufie et al. [16] has applied it by combining with PCA to predict PM10 concentration in Negeri Sembilan, Malaysia.Cigizoglu and Kisi [17] have also applied FANN.Chelani et al. [18] have predicted SO2 values at three sites in Delhi, India, using neural networks and compared the results with those of multivariate regression models.Wind speed, wind direction index, relative humidity and temperature variables have been used as inputs for their developed recurrent neural network.
Even though there were successes in many applications of ANN and considerably less restrictions on the environmental input data, large training data sets are usually required to improve the accuracy and minimize uncertainty in the output data, which up to now has been a significant disadvantage of these models.Gardner and Dorling [19] have reviewed the limitations and problems associated with the training of ANNs and emphasized that fundamental understanding of the basic theory is the key in developing ANNs.It is well known that a neural network can approximate any smooth nonlinear function between model inputs and outputs by selecting a suitable set of connecting weights and transfer functions [19].Therefore in this paper, selective combination of multiple neural networks (MNN) is introduced to improve the single feed-forward neural artificial network (FANN) prediction for the API model as shown in Fig. 1 [20].This paper is organized as follows: Section 2 presents the case study concerning the API sampling area and location in Malaysia.The concept of single FANN and MNN combination using FS and BE method are presented in Section 3. The results and discussions of the proposed MNN with selective combination are presented in Section 4. Finally, the last section concludes this paper.

CASE STUDY: PERAK AIR PULLUTION INDEX MONITORING STATIONS, MALAYSIA
Most air quality data are obtained from air quality monitoring stations directly or through remote sensing instruments.Here, the air quality data from 4 monitoring stations around Perak State were collected by the Department of Environment (DOE), Malaysia, which is stationed at CA0020, CA0041, CA0045 and CA0046, as illustrated in Fig. 2.These Continuous Air Quality Monitoring (CAQM)-type monitoring stations, are strategically located in residential, traffic, and industrial areas to detect any significant changes in the air quality that may be harmful to human health and the environment [1].The air quality data was recorded for 4 years, from 2006 to 2009 for eight variables.For the API modelling, variables involved are the concentrations of the air pollutants and meteorological variables, and are divided into groups of input and output variables for the FANN model.However for this study, 6 air pollutant inputs are selected for model development as shown in Table 1.A total of 1388 samples were used for the modelling and analysis in this study and the raw data for the modelling of year 2006 is shown in Fig. 3. Fig. 2: Perak air monitoring stations [1].

FEED-FORWARD ARTIFICIAL NEURAL NETWORK MODEL DEVELOPMENT
In this case study, an hourly average of 1388 data samples were taken from Malaysia's Department of Environment database from year 2006 to year 2009.All the data was normalized to zero mean and unit standard deviation to cope with the different magnitudes in the input and output data.Then, the input data were divided randomly using the Matlab TM divideint command into three sets of data, namely 70% (972 samples) as training data, 15% (208 sample) as testing data, and 15% (208 samples) as unseen validation data.Then the individual networks were trained using the Levenberg-Marquardt optimization algorithm with regularization and "early stopping".The networks are single hidden layer feed-forward neural networks (FANN).Hidden layer neurons use the logarithmic sigmoid activation function whereas output layer neurons use the linear activation function.In this study, 20 networks with fixed identical structure were   where u(t) is the process input at time (t), y(t)is the predicted process output at time t, which is the API, and m is the number of the process inputs and for this case study is 6 as shown in Table 1.Then the forward selection (FS) and backward elimination (BE) approach combined with simple averaging method was developed.The FS and BE method was developed in our previous paper with the different application of the prediction [21].Generally, in FS, the individual networks are added one at a time to the aggregated network where when the network is combined or included in the aggregated network it will produces the greatest decrease in model prediction MSE.This process starts with an empty aggregated model and the first network to be chosen in the aggregated network is the single network that has the least MSE in training and testing data or what we call the best individual network.The second network added is the one, when combined with the first added network, produces the largest reduction in MSE on the original training and testing data.This procedure is repeated until the MSE on the training and testing data cannot be further reduced by adding more networks.
On the other hand, in the BE, the aggregated network begins with combining all the individual networks in the pool of networks and removes one network at a time until the MSE on the training and testing data cannot be further reduced.The network deleted at each step is selected such that its deletion results in the largest reduction in the aggregated network MSE on the training and testing data.The detailed procedures for the FS and BE method can be found in [21].The simple average method is used in combining the selected networks in both approaches as shown in Eqn.(2) where, if all n networks are combined, the aggregated network output is: The performances of the actual and predicted models are based on the MSE and coefficient determination, R 2 .The advantages of using the MSE include its easy calculation and that it penalizes large errors in each observation.Therefore the average sum square error in each sample observation is able to determine the quality of the prediction of the model.On the other hand, the R 2 provides the inconsistency measure of the data reproduced or predicted and the fitness of the model to capture the actual process.The higher the values of R 2 , or closest to 1, and the smallest of the MSE, or closest to zero, the better the model.

RESULTS AND DISCUSSION
The inputs of these network models are the hourly average of carbon monoxide, wind speed, air temperature, relative humidity, PM10 and O3 and their output is the API values as shown in Eqn.(1).The single FANN network with a single hidden layer was applied with the Levenberg-Marquardt training algorithm with a sigmoid activation function in the hidden layer and a linear activation function in the output layer.The structure of the single FANN is represented by the number of nodes in each layer.The number of nodes in the input layer is 6, which represents the input variables, while the outer layer has only one node representing one model output variable.However, the fitted model was assured by the number of nodes in the hidden layer.
Therefore, the determination of the number of nodes in the hidden layer was carried out by calculating the MSE for the combination of training and testing data.The number of nodes in the hidden layer was varied between 1 and 20 in order to find the "best" number of nodes for the model.Figure 4 shows the performance of the model prediction with different numbers of nodes in the hidden layer.The lowest MSE value on the combination of the training and testing data was 0.1652, recorded by the model with 9 hidden nodes in the hidden layer.Thus, the network with 9 hidden nodes was selected as the final model structure or network architecture, i.e. the topology of the network is 6-9-1.Figure 5 shows the neural network model prediction performances on the training and testing data.In Figure 4, the solid lines represent the scaled true values of API and the dotted lines represent the model predictions.It can be seen that the predicted values are very close to the actual values for both sets of data.The MSE and the R 2 for training and testing data are 0.1988, 0.1613 and 0.7962, 0.8257 respectively.Figure 6 shows the model prediction performance on the unseen validation data from the single FANN.In Fig. 6, the scaled true values of API are represented by the solid line while the model predictions are represented by the dotted line.The single FANN model clearly emulates the patterns of process accurately on the unseen validation data.The MSE and the R 2 values on the unseen validation data are 0.1856 and 0.7950 respectively.Figure 6 clearly shows that the predicted and the actual values are close to each other.Thus, it showed in the intricate model that the API process can be modelled and generalized quite well using single FANN.
However, even though single FANN is shown to be able to predict the API quite accurately, single FANN models sometimes lack robustness as shown in Fig. 7a and 7b.Single FANNs sometimes suffer badly when applied to unseen data where some neural network might fail to deliver the correct result due to the network training converged to undesired local minima or over-fitting of noise in the actual data.In Fig. 7a, one of the best single FANNs in training and testing data was network number 14 but its  performance on the unseen data was not among the best.Figure 7b shows that the best network on unseen validation data is network number 7, but its performance on the training and testing data is not among the best at all.There is no guarantee that the best model on the training and testing data will be the best on the unseen data.Therefore the combination of multiple neural networks is proposed in this study with the aim of enhancing the neural network robustness on unseen data.Figure 8 shows the multiple neural network performance using selective combinations with BE and FS methods.The performance of aggregated networks on training and testing data is consistent with the performance on the unseen validation data for both selective  2. In this particular case, the FS and BE approaches led to the same individual networks being combined.Even though the number of networks combined was quite small for both selective methods, the most important thing is that both combination approaches perform better than the single FANN.As for comparison, Azid et al. [22,23] did carry out API modelling for the Southern region of Malaysia with 2 different sets of data containing 202,050 and 232,505 observations respectively.In [22] the input was reduced to 10 from 12 possible inputs with the R 2 and RMSE of 0.724 and 7.562 for unseen validation data respectively.On the other hand, in [23], the input was reduced to 5 from a possible 8 with the R 2 and RMSE of 0.618 and 10.017 for unseen data respectively.Therefore, the MNN did perform better than the [22] and [23] API modelling for Malaysia as shown in Table 2 with the R 2 and RMSE of 0.8200 and 0.160 for unseen validation data respectively.This performance was obtained with fewer sample data (1388 observations) as compared in [22] and [23].

CONCLUSION
This study proposes single FANN and multiple neural networks to model API based on the environmental monitoring data to get reliable and fast API predictions in order to mitigate the problems related to API.The single FANN does model the API quite well with relatively small MSE and high R 2 values on the unseen data.However, in order to overcome the non-robust nature of single FANN, multiple neural networks are proposed with two selective combination methods.Both selective combination methods further improve the model prediction as compared to single FANN and combining all networks.This clearly shows that it is possible to reduce the number of networks combined for the API prediction without losses in performance.

Fig. 4 :
Fig. 4: MSE on the training and testing data with different numbers of hidden nodes.

Fig. 5 :
Fig. 5: Actual and predicted values for training and testing data.

Fig. 6 :
Fig. 6: Actual and predicted values on the unseen validation data from single FANN.

Fig. 7 :
Fig. 7: MSE for single FANN.(a) MSE for Training and Testing data, (b) MSE for Validation data.

Fig. 8 :
Fig.8: MSE for aggregated multiple neural networks on the unseen validation data for BE and FS approaches.

Table 1 :
Air quality variables for API modelling The reduction of MSE in training and testing data for BE and FS combinations are consistent with the reduction of MSE in the unseen/validation data.It shows the robustness of the proposed modelling techniques as compared to the single FANN where the best network performance in training and testing data will not guarantee the best performance in the unseen validation data.The numbers of networks for the final combination are reduced to 3 networks for both methods which show the minimum MSE in Training and testing data that also correspond to MSE in validation data.The final result analysis is shown in Table

Table 2 :
Statistical Analysis of MNN performance on the unseen validation data