|
4.0
MATERIALS AND METHODS
| 4.1
|
Study
Design
The
time series approach was adopted in this study. This
design specifically caters for matched daily series
of exposure and outcome data (Schwartz et al., 1996).
It aims to quantify adverse short term effects of the
current levels of air pollutants on health. The "health
outcome" time series data were daily counts of hospital
admissions from certain causes (total respiratory and
cardiovascular diseases and individual diseases, namely,
bronchial asthma and acute myocardial infarction). The
"exposure" time series data were daily measurements
of the following "criteria pollutants": nitrogen dioxide,
respirable suspended particulates (with aerodynamic
diameter less than 10 µm), sulphur dioxide and
ozone. Statistical modelling was then performed, taking
into consideration the characteristic (approximately
Poisson) distribution, overdispersion and positive autocorrelation
of the outcome data. Daily meteorological variables
(temperature and humidity) and others (seasonal changes,
holidays, day of the week, time trends) were included
as confounding variables in the statistical model. The
health effects of individual pollutants were then examined.
The
time series approach allows the best use to be made
of routinely collected air quality and hospital data
currently available in Hong Kong to address questions
concerning the short-term health effects of ambient
pollution levels. It addresses the question of whether
variations in measured levels of ambient air pollution
are statistically associated with variations in health
outcomes (in the context of this study, acute hospital
admissions for selected diseases).
The
general approach of the study is modelled after the
protocol of the APHEA project (a European approach using
epidemiological time series data), developed within
the framework of the EC Environment 1991 - 94 Programme
(Katsouyanni et al., 1996). The rationale of using this
standard approach is that it represents the best compromise
between rigour and feasibility. It caters for the use
of aggregated time series data on air quality and health
outcomes that were not originally collected for the
purpose of epidemiological investigation. It also allows
the specific methods to be adapted to suit local data
and conditions, within a standardized framework that
has established guidelines and quality control criteria.
Finally, it allows the results to be compared to other
epidemiological time series studies within a meta-analysis
framework (Briggs et al., 1996).
The
databases and statistical methods employed will be elaborated
further in the following section.
|
| |
|
| 4.2
|
Databases
| 4.2.1
|
Overview
of Hospital Admissions Data
The
first part of this project involves the collection
of health "outcome" data based on daily hospital
admissions for selected respiratory and cardiovascular
illnesses in 12 hospitals under the Hospital Authority.
By design, the study included those hospitals
with Accident and Emergency (A & E) Departments
(ten in number, including eight major hospitals
in the Territory), where patients with acute health
conditions under study would be directed 24 hours
of the day, and where computerized medical records
using the International Classification of Diseases
(ICD) coding of diagnosis were available. One
(Ruttonjee Hospital) which served as a referral
base for emergency patients from the A & E
Department of a neighbouring hospital (Tang Shiu
Kin Hospital), and another (Our Lady of Maryknoll
Hospital) which provided a 24 hour Outpatient
Department were also included.
Private
hospitals, hospitals without Accident and Emergency
Departments and specialist hospitals (e.g., psychiatric
hospitals) were excluded from the study. The primary
reason for excluding private hospitals was that
computerized medical data were not available.
Also, none of them had Accident and Emergency
Departments. As patients admitted for acute respiratory
and cardiovascular symptoms would normally present
at an A & E Department, we have excluded those
hospitals without this facility. The contribution
of private hospitals to the total number of hospital
beds in Hong Kong was actually quite small (less
than 10 %), and it could be safely assumed that
their contribution to the total number of admissions
for acute respiratory and circulatory diseases
was relatively small. Specialist hospitals in
Hong Kong do not routinely admit patients with
the acute conditions covered by this study.
The
inclusion of Yan Chai Hospital, despite its lack
of an A & E Department until late 1994, was
due to its strategic location in the District
of Tsuen Wan, which was relatively highly polluted
and had no other suitable hospital available for
this study. One district hospital under the Hospital
Authority (Caritas Medical Centre) was excluded
because of the lack of computerized medical records
for the year 1994. The possible effects of its
exclusion will be discussed in Section 8.
For
the 12 hospitals included in the study, computerized
data were not available until 1993/94. These data
were stored within two separate medical record
systems - the Integrated Patient Administration
System (IPAS) and the Medical Record Abstracting
System (MRAS). For most hospitals, the IPAS system
was being gradually phased out by the Hospital
Authority to be replaced by the MRAS system. For
this study, data from both systems were extracted
(hospital by hospital) and modified into a common
computer format for statistical analysis. The
types of patient information contained within
these databases which were pertinent to the study
were as follows: Dates of admission and discharge,
personal information (age, gender, marital status,
ethnic group, district of residence, patient code),
hospital code, admission source, diagnosis code
and discharge status.
The
following diseases, based on the Ninth Revision
of the ICD (WHO, 1977) were selected for this
study:
a.
Diseases of the respiratory system (ICD460 - 519):
The
following disease groups were covered:
-
Acute respiratory infections (ICD460-466)
-
Other diseases of upper respiratory tract (ICD471-478)
-
Pneumonia and influenza (ICD480-487)
-
Chronic obstructive pulmonary disease (ICD490-496)
-
Pneumoconioses and other lung diseases due to
external agents (ICD500-508)
Asthma
(ICD493) was also analyzed separately.
b.
Diseases of the circulatory system:
-
Hypertensive disease (ICD401-405)
-
Ischaemic heart disease (ICD410-414)
-
Diseases of pulmonary circulation (ICD415-417)
-
Other forms of heart disease (ICD420-429)
-
Cerebrovascular disease (ICD430-438)
-
Diseases of arteries (ICD440-444)
Acute
myocardial infarction (ICD410) was also analyzed
separately..
Table
1 shows the time of migration from the Integrated
Patient Administration System (IPAS) to the Medical
Record Abstracting System (MRAS). The migration
to the latter system resulted in an almost complete
coding of hospital discharges, whereas up to 25%
of discharges (in some hospitals) were uncoded
within the former system. Most hospitals migrated
to the MRAS by early 1995.
| Table
1: |
Hospitals
by time of migration from the Integrated Patient
Administration System (IPAS) to the Medical
Record Abstracting System (MRAS). |
| Name
of hospital |
Time
of migration |
| United
Christian Hospital (UCH) |
February
1993 |
| Queen
Elizabeth Hospital (QEH) |
February
1993 |
| Pamela
Youde Nethersole Eastern Hospital (PYN) |
December
1993 |
| Tuen
Mun Hospital (TMH) |
October
1994 |
| Prince
of Wales Hospital (PWH) |
December
1994 |
| Ruttonjee
Hospital (RH) |
December
1994 |
| Kwong
Wah Hospital (KWH) |
January
1995 |
| Queen
Mary Hospital (QMH) |
April
1995 |
| Princess
Margaret Hospital (PMH) # |
April
1995 |
| Yan
Chai Hospital (YCH) |
November
1995 |
| Our
Lady of Maryknoll Hospital (OLM) * |
- |
| Pok
Oi Hospital (POH)* |
- |
| #
|
Official
migration date provided by Hospital Authority
only. Most medical records were still coded
in IPAS throughout 1995. |
| |
|
| *
|
Still
using IPAS |
The
total numbers of admissions due to respiratory
and cardiovascular diseases by hospital in 1994,
1995 and the first half of 1996 are shown in Table
2. A 10-30% increase in admissions can be observed
for most of these hospitals in 1995 compared to
1994, with the following exceptions. In Kwong
Wah Hospital and Our Lady of Maryknoll Hospital,
a slight decrease is apparent in 1995 and an increase
in 1996 (when extrapolated for the whole year).
By contrast, a dramatic sevenfold increase in
admissions was recorded in Yan Chai Hospital (YCH)
in 1995, which stabilized in 1996. Pamela Youde
Nethersole Eastern Hospital (PYN) recorded a 60%
increase in 1995 and 40% in 1996. A 48% increase
in 1995 and 29% in 1996 was found in Ruttonjee
Hospital (RH). Tuen Mun Hospital (TMH) had a smaller
increase of 35% and 29% in these years.** Overall,
there was an increase in the mean number of daily
admissions (respiratory and cardiovascular diseases,
all 12 hospitals) of 18.2% and 16.8% respectively.
These systematic differences, as noted above,
were adjusted for in the statistical modelling
procedures by the introduction of a t (linear
time trend) variable, t2 (quadratic time trend)
variable and a 'year effect' indicator.
| **
|
The
increase was likely to be due to the commissioning
of additional beds in PYH, RH and TMH during
the study period and the opening of the A
& E Department in YCH in late 1994. |
| Table
2: |
Number
of admissions due to respiratory and cardiovascular
diseases by hospital in 1994, 1995, and the
first half-year of 1996 |
| Hospital |
1994 |
1995 |
1996 |
| KWH |
12,282 |
9,789 |
5,697 |
| OLM |
831 |
729 |
491 |
| PMH |
9,529 |
10,775 |
5,712 |
| POH |
2,920 |
3,031 |
1,591 |
| PWH |
9,608 |
10,908 |
6,187 |
| PYN |
3,796 |
6,077 |
4,244 |
| QEH |
10,631 |
13,052 |
7,654 |
| QMH |
9,959 |
10,355 |
5,533 |
| RH |
2,815 |
4,170 |
2,691 |
| TMH |
6,058 |
8,194 |
5,282 |
| UCH |
8,709 |
9,185 |
5,790 |
| YCH |
1,743 |
7,011 |
3,460 |
| Total |
78,881 |
93,276 |
54,332 |
| Mean
daily admissions |
216.11
|
255.55 |
298.53 |
|
| |
|
| 4.2.2
|
Overview
of Air Quality Data
The
exposure time series data which were analyzed
include daily measures of the following air pollutants:
sulphur dioxide, nitrogen dioxide, ozone and respirable
suspended particulates (RSP, measured by Tapered
Element Oscillating Microbalance - TEOM). Various
daily measures (mean and maximum levels) of the
above pollutants were monitored at air quality
monitoring stations of the Environmental Protection
Department (EPD) and those available for the study
period were provided in a computerized format
(Table 3). The following monitoring sites: Central
and West, Kwai Chung, Kwun Tong, Sham Shui Po,
Shatin, Tai Po, Yuen Long and Tsuen Wan are located
on low roof tops (four to six storeys) in various
urban, industrial and new development areas. Data
collected at these stations represent 'population
background exposure' levels of ambient air pollution.
Data from Mongkok station were not comparable
as they were collected at street level and were
therefore excluded.
| Table
3: |
Summary
Description of Air Quality Parameters |
| Parameter |
Measurement
units and method |
Sub-parameters |
| Sulphur
Dioxide (SO2) |
ug.m-3
pulsed
fluorescence
|
SO2
- 24hr mean
SO2
- max 1 hr
|
| Nitrogen
Dioxide (NO2) |
ug.m-3
gas-phase
chemiluminescence
|
NO2
- 24hr mean
NO2
- max 1 hr
|
|
Respirable
suspended particulates (RSP)
(diameter
< 10 µg)
|
ug.m-3
tapered
element oscillating microbalance
(TEOM)
|
RSP
- 24hr mean
RSP
- max 1 hr
|
Ozone
(O3)
|
ug.m-3
Ultraviolet
absorption
|
O3
- 8 hr (9am-5pm) mean
O3
- max 1hr
|
A
rigorous quality control programme has been implemented
by the EPD (EPD, 1994). Measuring instruments
are routinely calibrated and spurious data caused
by extrinsic factors are screened out to produce
a valid, if not complete, data set. For the hourly
data to be accepted, two third of the 5-minute
readings for that hour must be available and valid.
The same "two third" criterion applies to the
daily values which summarize the hourly readings.
This study, however, adopted a more rigorous "75%
criterion" in order to conform to the APHEA protocol.
Also, for each pollutant, monitoring stations
with more than 25% of valid daily measurements
missing for the entire study period were excluded.
As particulates have been shown to exert significant
effects on health in many studies (Schwartz &
Dockery, 1992; Dockery & Pope, 1994; Schwartz
et al., 1995; Schwartz, 1996; Samet et al., 1995),
an exception was made for RSP (measured by TEOM).
In this case, three stations which were missing
more than 25% of the data series (but less than
33%) were included (Table 4). For a station with
less than 25% of missing daily values, the missing
values were estimated based on the available measurements
in the other monitoring sites for the same day.
The daily missing value was replaced by the mean
daily level of the remaining stations multiplied
by a correction factor, which was the ratio of
the seasonal (three-month) mean for the missing
station to the corresponding seasonal mean for
the remaining stations. The detailed APHEA methods
for preparing the air quality data for time series
analysis, including the imputation of missing
data, are presented in Appendix I.
Table
4 shows the degree of completeness of the air
quality data based on the above criteria. Data
from Yuen Long could not be included for any of
the air pollutants due to the extent of missing
data (more than 80% in 1994). Data for NO2 were
complete for all remaining seven stations, SO2
for six stations, RSP for five and O3 for two.
Data for the first half-year of 1996, especially
for RSP, were much more complete than in 1994-95.
| Table
4: |
Percentage
of valid daily measures of air pollutants
by station available for the study period
(1994 - 95 and first half of 1996) |
| |
Central
Western
|
Kwai
Chung
|
Kwun
Tong
|
Sham
Shui
Po |
Shatin
|
Tai
Po |
Tsuen
Wan
|
| NO2 |
| 1994 |
90.41 |
92.33 |
95.34 |
86.58 |
65.75 |
100.00 |
93.70 |
| 1995 |
92.60 |
92.33 |
95.89 |
89.04 |
97.26 |
96.71 |
95.34 |
| 1994-95 |
91.51 |
92.33 |
95.62 |
87.81 |
81.51 |
98.36 |
94.52 |
| 1996 |
98.52 |
99.01 |
99.38 |
97.41 |
99.63 |
98.89 |
98.03 |
| O3 |
| 1994 |
89.32 |
93.42 |
- |
- |
- |
- |
- |
| 1995 |
96.71 |
96.44 |
- |
- |
- |
- |
- |
| 1994-95 |
93.01 |
94.93 |
- |
- |
- |
- |
- |
| 1996 |
98.77 |
99.26 |
|
|
|
|
|
| SO2 |
| 1994 |
100.00 |
94.79 |
97.53 |
93.15 |
97.81 |
- |
93.70 |
| 1995 |
98.08 |
96.71 |
97.26 |
93.15 |
99.45 |
- |
95.07 |
| 1994-95 |
99.04 |
95.75 |
97.40 |
93.15 |
98.63 |
- |
94.38 |
| 1996 |
99.26 |
98.77 |
99.51 |
95.05 |
99.88 |
- |
99.14 |
| RSP
(by TEOM) |
| 1994 |
53.42 |
45.75 |
56.99 |
- |
76.44 |
- |
91.23 |
| 1995 |
85.21 |
94.79 |
79.18 |
- |
87.12 |
- |
94.25 |
| 1994-95 |
69.32 |
70.27 |
68.08 |
- |
81.78 |
- |
92.74 |
| 1996 |
100.00 |
100.00 |
98.40 |
- |
96.67 |
- |
97.17 |
|
| |
|
| 4.2.3
|
Overview
of Meteorological Data
Meteorological
data, namely, daily mean, maximum and minimum
temperature and relative humidity were obtained
through the Royal Observatory for the study period.
There were seven stations (King's Park, Lau Fau
Shan, Wong Chuk Hang, Shatin, Tuen Mun, Ta Kwu
Ling and Tseung Kwan O). The entire series of
daily values were complete for all stations except
Ta Kwu Ling, where data were not recorded for
only three days. Mean temperature and humidity
were confounding variables as they vary with time
and have been shown to be correlated with both
air quality and health outcome variables (Schwartz
et al., 1996). Appropriate adjustments were made
for their effects in the statistical modelling.
|
|
| |
|
| 4.3
|
Statistical
Modelling
The
statistical modelling followed the guidelines proposed
by the APHEA protocol, which established that hospital
admissions data are generally best represented by a
Poisson distribution (Schwartz, Spix, Touloumi, et al.,
1996). This is because, on any given day, only a small
proportion of the population is admitted to hospital
and large numbers of admissions are relatively rare.
Also, the numbers of admissions represent counts which
are non-negative integers. It has also been observed
that admissions data are usually overdispersed (that
is, the variance is larger than the mean), and positively
auto-correlated. This is in contrast to the characteristics
of a Poisson distribution, in which the variance is
equal to the mean.
In
a Poisson process, which is a relative risk model, a
homogeneous risk to the underlying population on a given
day is assumed (Schwartz et al., 1996). Given that underlying
risk, the expected number of admissions on any day is
.
The probability of y admissions occurring on a given
day is given by:
The
Poisson regression model assumes that
varies with time varying predictor variables X1,
X2 ..... Xn,
log
= b 0
+ 1X1
+ .... + nXn
where
X1 .... Xn are the predictors of daily admissions /
mortalities and 1
.... n
are the regression coefficients for these predictors.
The relative risk of the ith predictor is given by e
i
.
The
presence of overdispersion and serial correlation necessitates
some statistical adjustment to the Poisson model. To
address these problems, a number of methods have been
reported in the literature. An iterative method called
Generalized Estimating Equations (GEE), which is an
extension of the Poisson regression, was used by Zeger
(1988). In this method, the vector of residuals was
weighted by an estimate of the inverse of the covariance
matrix, and the weighted residuals were filtered with
an autoregressive filter.
Brannas
and Johansson (1994) extended the Poisson regression
model by correcting the covariance matrix. The significance
of the predictors was then assessed by the X2
test using the corrected estimates of the variances,
allowing valid inferences to be made from the regression
coefficients. This is a much simpler procedure than
the computer intensive method by Zeger.
When
applying Brannas and Johannson's method in this study,
we have found that the deviance remained quite large
despite using different transformations of the independent
variables specified in the APHEA protocol. Williams
(1982) suggested that a large residual variation might
be due either to the intrinsic (overdispersed) nature
of the data or to some overlooked explanatory variables.
He proposed a method for correcting overdispersion by
multiplying the variance by an estimate of the dispersion
parameter.* This method was adapted to the Poisson model
by Breslow (1984) by taking appropriate limits in Williams'
formulae. After testing all the potential confounding
variables recommended by the APHEA protocol, we accepted
that the data was overdispersed and adopted Williams'
method of correction. Williams' method is supported
by the SAS statistical software (SAS, 1996).#
| *
|
Suppose
that the data consist of n binomial observations.
The variance of the response probability is given
by: V(Pi) = ØØ pi(1-pi)
An estimate of Ø , a non-negative but
otherwise unknown dispersion parameter, was made
by equating the value of Pearson's chi-square statistic
for the full model to its approximate expected value.
After a weighted fit of the model,
and X2 were recalculated, and a revised
estimate of Ø was calculated. The iterative
procedure was repeated until c2 is very close
to its degree of freedom. |
| |
|
| #
|
We
used PROC LOGISTIC of SAS to run the Poison regression
model. The option scale=Williams in PROC LOGISTIC
was then chosen. |
The
APHEA guidelines (Katsouyanni et al., 1996) were also
followed in the construction of the models. This procedure
started with the construction of a "core" model in which
the potential confounders of the short term relationship
of air pollutants and daily hospital admissions for
respiratory and cardiovascular diseases in 1994 and
1995 were investigated. In a time series model, the
response variables (hospital admissions and deaths)
show both a long term trend and shorter term periodic
variations. These have to be adjusted for in order to
identify the effects attributable to the pollutants.
The APHEA guidelines specify that the core model (without
pollutant variables) should include variables to account
for the following - long term trends (time trend), medium
term variations (season, using sine and cosine terms
to control for seasonal and other cyclical patterns),
short term systematic (day of the week, holidays, day
after holiday) and short term, less systematic (meteorological)
variations. The inclusion of these variables removes
much of the "noise" in the model. Based on the APHEA
protocol and the goodness-of-fit of the models, the
following variables were included:
Linear
time trend, t: (Day 1,2,..., 730)
Quadratic
time trend, t2: 1,4,9,...)
Year-effect
indicator, Y: (1994 and 1995)
Day
of the week: I1 to I6 (six dummy variables)
Holiday:
H1
Day
after holiday: H2
Seasonality,
S1 - S4, CS1 - CS4: sin {2kąt/365} and cos {2kąt/365},
where k = 1,2,3,4 (one year, six months, four months
and three months)
Daily
mean temperature
Daily
relative humidity
Time
lags for temperature and humidity, and the interaction
between temperature and relative humidity were found
to be insignificant when entered in a stepwise multiple
linear regression model. These were excluded from the
core model on this basis.
Verhoeff,
Hoek, Schwartz & van Wijnen (1995) observed in the
Amsterdam study of air pollution and daily mortalities
that, after controlling for season and trend, the magnitude
of the serial correlation in hospital admission data
was low and the estimates were only slightly changed
by incorporating serial correlation. However, considerable
overdispersion was noted in Hong Kong's hospital admissions
dataset for 1994 and 1995. Accordingly, adjustments
for the overdispersion of the daily hospital admissions
were made using Williams' method (1982), described above,
and the results compared with the simple Poisson model.
Delayed
effects of air pollutants were explored using single
day lags and cumulative lags up to five days for ozone
and three days for the other air pollutants. Owing to
the high correlation coefficients between individual
pollutants, the 'single pollutant model' was used to
determine the effect of each individual pollutant on
hospital admissions. In this approach, each air pollutant
was separately entered into the "core model" to obtain
its respective partial regression coefficient. It is
recognized that adverse health outcomes may be due to
the combined exposure to more than one pollutant. However,
certain pollutant variables were highly correlated and
the APHEA protocol recommends the construction of single-pollutant
models as a starting point. Based on the partial regression
coefficients (b )
of the individual pollutants in the single pollutant
model, relative risks of hospital admissions and deaths
due to respiratory and cardiovascular diseases associated
with a 100 ug.m-3 increase in air pollutant concentrations
were derived.
The
effects of more than one pollutant (including their
interactions) were then explored using a 'multiple pollutant
model'. In this multiple pollutant model*, relative
risks of individual pollutants adjusted for the effects
of the others, were obtained. The final model was constructed
by the following steps:
- Initially,
all 4 pollutants (main effects) and all 2-way interactions
were included. Stepwise selection was employed to
select the significant interaction(s).
-
Insignificant pollutants (except those involved in
the significant interactions) were then removed from
the model.
-
William? method was applied to the model obtained
in Step 2.
- Any
insignificant interaction(s) and main effect(s) was
(were) removed.
- The
final model consisted of significant main effect(s)
and significant interaction(s) as well as the
corresponding main effect(s) in those significant
interaction terms (even though the main effects by
themselves were insignificant).
In
the choice of pollutant parameters, the 'best' lags
or cumulative lags which had been selected in the single
pollutant models were used in the construction of the
multiple pollutants model. When significant interactions
were observed, the relative risks of one pollutant at
different levels of the interacting pollutant were calculated.
The effect of multi-collinearity was compared with and
without using Ridge regression (Schaefer, 1986).
| *
|
To
address the problem of collinearity between the
pollutants, the technique of Ridge estimation for
collinear data in logistic regression (Schaefer,
1986) was used but the results were similar to those
without using this method. |
|
| |
|
| 4.4
|
Validation
of Model
The
model based on data from 1994 to 1995 was then validated
using data for the first half-year of 1996. Fitness
of the model was assessed by plotting the observed and
predicted daily admissions on the same graph for each
pollutant to look for discrepancies visually.
|
|