TY - JOUR
T1 - Prediction of PM2.5 concentrations at the locations of monitoring sites measuring PM10 and NOx using generalized additive models and machine learning methods: A case study in London
AU - Analitis, Antonis
AU - Barratt, Ben
AU - Green, David
AU - Beddows, Andrew
AU - Samoli, Evangelia
AU - Schwartz, Joel
AU - Katsouyanni, Klea
PY - 2020/11/1
Y1 - 2020/11/1
N2 - The adverse health effects of air pollutants, especially those of PM
2.5, are well documented. However, a lack of adequate monitoring and weaknesses in modelling approaches do not allow a good assessment of health effects in many areas of the World. Advances in computational methods and the availability of new data sets, e.g. satellite remote observations, have enlarged the possibilities of modelling for application in large scale health effects studies. However, PM
2.5 monitoring is very recent in most of the World and more limited compared to other pollutants, and understanding how to use PM
10 monitors to estimate PM
2.5 exposure is therefore important. Since interest in these methods is relatively recent, there is a need for testing their performance against ambient measurements, but long term PM
2.5 datasets are less readily available than PM
10 in many regions. In the present study we report the methodology and results of using regression modelling and a machine learning method (Random Forest-RF), as well as a combination of the two, to enhance a PM
2.5 measurement data base in London using PM
10 and NO
x measurements as well as other predictors and compare the relative performance of each method. We found that the combination of predictions by the regression model and the RF performs best and we obtain a cross-validation R
2 of 99.29% and 98.22% for the 5-year periods 2004–2008 and 2009–2013, respectively, and a Mean Square Error near 1. Our enhanced data base for PM
2.5 is available for use by other researchers.
AB - The adverse health effects of air pollutants, especially those of PM
2.5, are well documented. However, a lack of adequate monitoring and weaknesses in modelling approaches do not allow a good assessment of health effects in many areas of the World. Advances in computational methods and the availability of new data sets, e.g. satellite remote observations, have enlarged the possibilities of modelling for application in large scale health effects studies. However, PM
2.5 monitoring is very recent in most of the World and more limited compared to other pollutants, and understanding how to use PM
10 monitors to estimate PM
2.5 exposure is therefore important. Since interest in these methods is relatively recent, there is a need for testing their performance against ambient measurements, but long term PM
2.5 datasets are less readily available than PM
10 in many regions. In the present study we report the methodology and results of using regression modelling and a machine learning method (Random Forest-RF), as well as a combination of the two, to enhance a PM
2.5 measurement data base in London using PM
10 and NO
x measurements as well as other predictors and compare the relative performance of each method. We found that the combination of predictions by the regression model and the RF performs best and we obtain a cross-validation R
2 of 99.29% and 98.22% for the 5-year periods 2004–2008 and 2009–2013, respectively, and a Mean Square Error near 1. Our enhanced data base for PM
2.5 is available for use by other researchers.
KW - Ensemble methods
KW - Environmental exposure
KW - London case study
KW - PM prediction
KW - Random forest
UR - http://www.scopus.com/inward/record.url?scp=85089500551&partnerID=8YFLogxK
U2 - 10.1016/j.atmosenv.2020.117757
DO - 10.1016/j.atmosenv.2020.117757
M3 - Article
SN - 1352-2310
VL - 240
JO - ATMOSPHERIC ENVIRONMENT
JF - ATMOSPHERIC ENVIRONMENT
M1 - 117757
ER -