Novel Machine Learning Method Could Improve Understanding of Air Pollution Impact on Children

Published February 2017
Atmospheric Environment

Big spatial data and machine learning methods are winning the battle over the tiniest of moving targets: the elemental components of particulate matter that pollute the air we breathe.

An emerging, novel approach to assessing pollutant exposures, called a land use random forest (LURF) model, proved more effective than the current standard land use regression (LUR) model at estimating personal pollution exposure. The study demonstrates that LURF could more accurately connect pollution risk levels and chronic health outcomes in children.

“These models could someday help guide patient care by supplying clinicians with a bigger picture of the environment in which a patient spends their time,” says Cole Brokamp, a research fellow and first author. The paper was his PhD dissertation. Patrick Ryan, PhD, MS, was senior author. Both are with the Division of Biostatistics and Epidemiology.

Using ambient air sampling data from 24 sampling stations in urban Cincinnati, the team tested the LURF and LUR models as they measured 11 elemental components of pollution, ranging from aluminum to zinc. They factored in more than 50 predictors associated with transportation, physical features, community socioeconomic characteristics, greenspace, land cover, and emission point sources.

The LURF model proved more effective at capturing complex interactions and nonlinear relationships between land use predictors and pollutant concentrations.

“Advancing exposure science methodology is very exciting,” Brokamp says, “but the most rewarding aspect is probably seeing the exposure models implemented in health studies.”

A novel land use random forest (LURF) model was applied to predict the concentrations of several elemental components of air pollution and its accuracy was compared to that of the standard land use regression (LUR) approach. In the chart above, the mean absolute prediction error (MAPE) for predicting pollutant concentrations at new locations is shown for each model type and elemental pollutant. The new LURF approach outperformed the standard LUR approach for nearly all pollutants. Neither model was effective at predicting nickel concentrations.

Click image to enlarge.

A land use random forest (LURF) exposure assessment model was used to predict the concentration of lead in the air at study participants' homes in the Cincinnati area. The findings will allow further study on the effects of airborne lead on childhood health.

Click image to enlarge.

Citation