مقایسه کاربرد شبکه عصبی مصنوعی، درخت تصمیم، رگرسیون مؤلفه‌های اصلی و رگرسیون خطی چندگانه جهت مدل‌سازی شاخص کیفیت هوای شهری

احسان زاده, علیرضا; نژادکورکی, فرهاد; طالبی, علی

doi:10.22059/jes.2016.60060

فهرست نشریات

فهرست نشریات دارای اعتبار وزارت علوم، تحقیقات و فناوری

فهرست مجلات علمی- پژوهشی دانشگاه تهران

نحوه ارسال مقاله برای مجله- ثبت نام در سامانه- فراموش کردن رمز عبور

تعداد نشریات	163
تعداد شماره‌ها	6,824
تعداد مقالات	73,561
تعداد مشاهده مقاله	134,682,253
تعداد دریافت فایل اصل مقاله	105,208,960

	مقایسه کاربرد شبکه عصبی مصنوعی، درخت تصمیم، رگرسیون مؤلفه‌های اصلی و رگرسیون خطی چندگانه جهت مدل‌سازی شاخص کیفیت هوای شهری
محیط شناسی
مقاله 1، دوره 42، شماره 3، آذر 1395، صفحه 455-473 اصل مقاله (733.28 K)
نوع مقاله: مقاله پژوهشی
شناسه دیجیتال (DOI): 10.22059/jes.2016.60060
نویسندگان
علیرضا احسان زاده^* ¹؛ فرهاد نژادکورکی²؛ علی طالبی³
¹دانشجوی کارشناسی ارشد مهندسی محیط زیست - دانشگاه یزد و عضو باشگاه پژهشگران و نخبگان جوان
²دانشیار گروه مهندسی محیط زیست دانشگاه یزد
³دانشیار گروه مهندسی آبخیزداری دانشگاه یزد
چکیده
شاخص کیفیت هوا ابزار کلیدی جهت آگاهی از کیفیت هوا، نحوۀ اثر آلودگی هوا بر سلامت و روش‌های محافظتی در برابر آلودگی هوا است. هدف اصلی این تحقیق مدل‌سازی و برآورد شاخص کیفیت هوا از طریق شبکه عصبی مصنوعی، درخت تصمیم، رگرسیون خطی چندگانه و رگرسیون مؤلفه‌های اصلی است. جهت محاسبه شاخص کیفیت هوا از داده‌های هواشناسی و آلودگی هوای ثبت شده در ایستگاه تجریش و قلهک شهر تهران در دوره زمانی 1385 تا 1390 استفاده شد. به منظور ارزیابی عملکرد مدل‌های برآوردگر از شاخص‌های آماری خطا، همبستگی و صحت استفاده شد. نتایج تحقیق نشان داد که مدل‌ شبکه عصبی در هر دو ایستگاه از عملکرد بهتری نسبت به سایر مدل‌ها برخوردار است، به نحوی که در ایستگاه قلهک 006/0RMSE=، 004/0MAE=، 99/0 IA=و در ایستگاه تجریش 004/0 RMSE=، 002/0 MAE=، 1 IA=بود. مدل درخت تصمیم بعد از مدل شبکه عصبی عملکرد مطلوبی از خود نشان داد و مدل رگرسیون خطی چندگانه بعد از مدل شبکه عصبی و درخت تصمیم عملکرد بهتری نسبت به مدل‌ رگرسیون مبتنی بر تحلیل مؤلفه‌های اصلی ارائه نمود. روش تحلیل مؤلفه‌های اصلی علی‌رغم آنکه توانست همبستگی بین داده‌های ورودی و تعداد پارامترهای ورودی به مدل را کاهش دهد باعث بهبود عملکرد مدل رگرسیون نشد.
کلیدواژه‌ها
شاخص کیفیت هوا؛ مدلسازی؛ شبکه عصبی مصنوعی؛ درخت تصمیم؛ رگرسیون مؤلفه های اصلی
عنوان مقاله [English]
Comparing the Performance of Artificial Neural Networks, Decision Tree, Principal Component Regression and Multiple Liner Regression in Modeling Urban Air Quality Index
نویسندگان [English]
Alireza Ehsanzadeh¹؛ Farhad Nejadkoorki²؛ Ali Talebi³



چکیده [English]
1. Introduction: Increasing urbanization and industrialization rate in developed and developing countries cities, such as Tehran, has led to increased air pollution. Todays, the prediction and estimation of air quality parameters in urban regions are important topics in environmental studies due to their effect on human health. Measurement of air quality are widely used in air quality control plans. These measurement classify air quality based on the amount of pollution and various contaminants. The first measure of air quality is Pollutant Standards Index (PSI) which has been developed by the U.S. Environmental Protection Agency (USA-EPA). This index converts concentration of the main air pollutants such as carbon monoxide (CO), sulfur dioxide (SO2), particulate matter less than ten microns (PM10), ozone (O3), and nitrogen dioxide (NO2) into the air pollution standard index. In 1997, PSI was expanded by the US-EPA and presented under a new index named Air Quality Index (AQI). One of the first steps that must be taken for air pollution control is measuring the concentration of air pollutants including PM10, CO, O3, SO2, and NO2. An index named AQI can determine the relationship between concentration of pollutants and the level of public health and controlling measures related to air pollution. This index classifies air quality into six main groups of good, moderate, unhealthy for sensitive groups, unhealthy, very unhealthy, and hazardous. This index also involves the controlling measures related to each class for preventing adverse effects of pollutants on different walks of life. Poor air quality caused by high concentrations of pollutants in the large city of Tehran has caused various diseases and many problems to the public health and welfare of citizens and also causes damage to the environment and living organisms. Hence, assessment and modeling of urban air quality, which has a nonlinear nature, and also determining the factors affecting it are considered one of the most essential environmental programs in large cities. Therefore, the present paper aims to compare the efficiency of artificial neural networks, decision tree, multiple liner regression and principal component regression in modeling and estimation of urban air quality index. 2. Materials and methods: In the present study, hourly data on concentrations of air pollutants and meteorological parameters related to Tajrish and Gholhak stations in Tehran will be used for modeling and estimation of AQI. Meteorological and air pollution data recorded at Gholhak and Tajrish stations, Tehran covering the course 2005 to 2011 to develop models. For the assessment of the performance of the models and comparison of the obtained results in train and test phases, statistical indices such as Index of Agreement (IA), Fractional Bias (FB), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Square Error (MSE), Correlation Coefficient (R) and coefficient of determination (R2) were used. The initial objective is to use the guidelines of US-EPA and Iranian Center Environmental Health and Work (CEHW) to calculate air quality index based on the hourly concentrations of each of pollutants. In the next step, air pollution and AQI value will be obtained using time series of meteorological data. Then, simulator and estimator models of air pollution will be developed using artificial neural networks (ANN), decision tree, multiple liner regression (MLR) and principal component regression (PCR) methods in MATLAB software. In the first step, concentration of each of pollutants is the input to the algorithm of AQI calculation and the output will be air quality index for each pollutant and the overall air quality index will be used for development of models along with meteorological data. To develop the models, data were randomly divided into two categories of training and testing. In this study, 80 percent of data were used in the training phase and 20 percent of them were used in the testing phase. The final objective is simulation and estimation of air quality index for the studied stations in Tehran. At the end, the methods used for modeling in this study will be compared with each other in order to identify the model which produces better results of estimation and modeling. 3. Results: The results of calculation of air quality index show that the dominant class of air quality in Gholhak Station is “unhealthy for sensitive groups” with 11165 hours and the main cause of poor quality of air in this station is nitrogen dioxide. In Tajrish station, the class “moderate” is dominant with 17538 hours and PM10 are the major responsible for this quality of air. The results of modeling showed that the efficiency of the applied methods in the study has different performances for the estimation of AQI. According to the findings, CART algorithm is of high performance in estimation of air quality index, as the correlation between simulated and observed values are very close to 1. Based on train and error, it was found that Perceptron artificial neural network with a hidden layer and Levenberg-Marquardt training algorithm, with 20 neurons in the hidden layer of Gholhak station and 25 neurons in the hidden layer of Tajrish station, yields the best performance in estimation and modeling of air quality index. The highest correlation between target variable and estimated values was also determined. Initial investigation showed that there is significant correlation between the input data used in Gholhak and Tajrish stations. To resolve this problem, principal component analysis (PCA) method was used. KMO test was used in order to determine the feasibility of PCA. Since KMO value was obtained 0.581 in Gholhak station and 0.606 in Tajrish station, the feasibility of PCA method was confirmed. To perform this method, after standardization of input variables, the correlation matrix was established and 13 eigenvalues and eigenvectors for Gholhak Station and 12 eigenvalues and eigenvectors for Tajrish station were obtained. The components 1 to 5 in Gholhak station and components 1 to 4 in Tajrish station had an eigenvalues greater than 1. These components were selected as the main components and used as the inputs to the regression model. Equations 1 and 2 show the regression model of AQI estimator in Gholhak and Tajrish stations: AQI = -63/74 + (9/89 × PC1) + (0/ 2 × PC2) + (0/ 19 × PC3) – (0/ 094 × PC4) - (1/09 × PC5) (1) AQI = 28/23 + (0/ 933 × PC1) + (0 / 2415 × PC2) + (0/ 0336 × PC3) - (0/ 0088 × PC4) (2) 4. Discussion and conclusion: Error statistics in two stations showed that decision tree model in Gholhak Station has a better performance than this model in Tajrish Station. Correlation coefficient (R) and coefficient of determination (R2) in both models were very close to 1 which suggests the high ability of regression decision tree model in estimation of urban air quality. Comparison of error statistics in the studied stations showed that ANN model in Tajrish stations has a better performance than this model in Gholhak Station. Error statistics in both stations showed that PCR model in Tajrish station has a better performance than this model in Gholhak station. The results of investigation of all methods used for modeling and estimation of air quality index in the studied stations show that ANN model with Levenberg-Marquardt training algorithm had the best performance in both stations. The worst performance was observed in PCR model. In this research study, the air quality was monitored in two station. The findings of this research suggest that the models employed here are apt for the appraisal of air quality in the studied stations, and they can be used by researchers as a tool for gaining knowledge about the air quality and taking measures for controlling, decreasing, and preventing pollution as well as for more accurately informing the public on the air quality level in the polluted urban areas.
کلیدواژه‌ها [English]
modeling, Air Quality Index Artificial Neural Network, Decision Tree, Principal Component Regression

آمار تعداد مشاهده مقاله: 1,716 تعداد دریافت فایل اصل مقاله: 1,122

سامانه مدیریت نشریات علمی. قدرت گرفته از سیناوب

پیوندهای مفید

پیوندهای مفید

اخبار و اعلانات

آمار

مقایسه کاربرد شبکه عصبی مصنوعی، درخت تصمیم، رگرسیون مؤلفه‌های اصلی و رگرسیون خطی چندگانه جهت مدل‌سازی شاخص کیفیت هوای شهری