Short-term effects of ambient air pollution
on public health in Hong Kong - a follow-up study
A consultancy report submitted to the
Environmental Protection Department, Hong Kong
(Operation Manual)
February 1998
Department of Community Medicine
The University of Hong Kong
OPERATION MANUAL
Data management and data analysis procedures
a) input data format and medium
The data was then read into S-plus objects in a sub-directory (one for each batch of data). The source programmes are attached (Source Programmes 1.1, 2.1, 3.1 and 3.2). The format of the data can be found from these Source Programmes.
b) examination and cleaning data
i) meteorological data
- tabulation of monthly average of meteorological measures by year
ii) air pollutant data
- tabulation of monthly average of air pollution data by monitoring stations by year
iii) hospital admission data
- tabulation of total number of in-patient by hospitals discharges for financial year
c) missing data definition and replacement
After checking for completeness of the hospital admission data, the only missing data arise from the unavailability of discharge diagnosis at the time of data analysis. The scope of missing data of this kind can be minimized by waiting for several months so as to capture those who have a long length of stay in hospital as well as those in hospital, which return the diagnosis and the ICD code late. (It is advisable to obtain data at a date three months later than the required period of the study.)
d) descriptive statistics and graphs
i) number of hospital admissions by disease groups and by age groups
ii) summary statistics of meteorological data with pollution data and hospital admission data (Source Programmes 2.4)
iii) Spearman's rank correlation coefficient between any two pollution, meteorological and health outcome data
iv) time series plots of each data
e) data analysis
- S-plus command used: summary(), sqrt(var())
- S-Plus command used: cor.test(..., method="spearman")
iii) multiple regression used for obtaining the multiple R squared value
- S-plus command used: lsfit()
iv) Poisson adjusted with over-dispersion regression used for modelling the various health outcomes with air pollution concentrations and other covariates.
- S-Plus command used: glm(..., family=quasi(link="log",var="mu"),...)
v) Akaike Information Criterion (AIC) was computed and used to identify the model with best cumulative lag. The AIC is the sum of the deviance residuals and twice the number of degrees of freedom used in fitting the model. It can be thought of as the deviance with penalty added to take account of the number of parameters in the model. In choosing models, the model with the smaller AIC is preferred.
- S-plus command and self-written S-Plus function used:
AIC ? deviance(glm.object) + 2*(length(glm.object$fitted) - glm.object$df.resid)where glm.object can be obtained from above procedure (iv).
Then, min(AIC).
vi) Principal component analysis used for generating a composite score of 4 different pollutants
- S-plus command used: princomp()vii) Interaction effects between co-pollutant, pollutant and 4 different seasons were performed
viii) Autocorrelation function used for estimating the serial correlations among residuals of health outcome after modelling.
- S-Plus command used: acf(resid(glm.object))
Advice for future study
-
- include all mortality in Hong Kong
-
- include other pollutants e.g. carbon monoxide, total suspended particulates and PM2.5
-
- include Caritas Medical Centre which is also a referral based hospital from its A&E department
-
- generalized additive model
- harvesting effects
- dose response relationship
-
- hospital admission rate by TPU
- clustering of TPU with monitoring stations
- modelling hospital admission rates with air pollutant concentrations and with covariates from each TPU
Suggestion and guidelines for future study
To be in line with the new trend in developing new hypotheses and methodological insights in the APHEA II project, the following should be the focus in any future study relevant to the Hong Kong situation:
2. exploration of new methodological approaches to develop a better understanding of how premature deaths caused by air pollution (harvesting or mortality displacement) and what is the effect of harvesting on estimating the size of the effect parameters; and
3. investigation of regional differences and explanation of heterogeneous effect estimates via modelling techniques by taking advantage of the extended data-base.
In this study, using only two years of data, we can only study the linear effect of air pollutants without taking account of a possible threshold. We need to check more closely the residual plot for each statistical model, identify the sources of unexplained variations, autocorrelation and harvesting effects and make adjustment for them if any. They are important issues in obtaining an explanatory model for the effects of air pollutant concentrations on hospital admissions and deaths. Although we found air pollutants were related to acute hospital admissions and deaths, we should be cautious at this stage as the methodology is still under development in other parts of the world and studies on the acute health effects in Hong Kong are still in their infancy.