PhD Defence Ms Sibusisiwe Audrey Khuluse

Department of Earth Observation Science

Sibu Khuluse

Title of defence

Spatial statistical modelling of urban particulate matter


Chronic exposure to poor air quality poses a risk to respiratory health. The spatial distribution of air quality and socioeconomic vulnerability, however, is not equitable, as those most vulnerable often reside in areas with poorer air quality. In the Highveld region in South Africa, like in other rapidly developing urban regions in developing countries, air quality mapping for the purpose of investigating population exposure is important given the need for interventions to reduce the environmental impact on health. In this context statistical air quality mapping is challenging because of the sparsity of air quality monitoring network in space and time. The aim of this thesis was to assess the risk of exposure to poor air quality indicated by excessive ambient concentrations of PM10 and PM 2.5. This hinged on the development of methods to overcome data constraints in the form of high proportions of missing data per air quality station and the limited number of stations in the study area.

When the target variate is measured at few locations, a suitable and spatially extensive covariate can improve the reliability of predictions at unmeasured locations. The first objective was to compare ordinary kriging and model- based geostatistical methods, and to assess the significance of housing related factors as proxies for domestic emissions for spatial prediction of the annual PM10 exceedance rate at unmeasured locations. The exceedance threshold is the PM10 South African national air quality standard (NAQS) of 120 µg m−3 for average daily ambient concentrations. Four geostatistical methods were explored, two based on kriging and the others on the model-based geostatistical methods. A Poisson generalized linear geostatistical model was considered because of the type of data, namely PM10 yearly exceedance counts from 36 air quality stations in the South African Highveld region for the period 2008 to 2012. The other models were the log-Gaussian geostatistical model, ordinary and external drift kriging. The spatial patterns of the PM10 NAQS exceedance rate, namely the location of hot-spots, were similar for kriging and the generalized linear geostatistical models. All four models were biased upwards. The relative accuracy of predictions to the actual data was highest for ordinary kriging as compared to the log-Gaussian and Poison models without covariates. External drift kriging predictions were more precise as compared to the model-based alternatives with covariates. Predictions from models with covariates were higher in areas where the density of informal dwellings was higher. The Poisson model performed better than the log- Gaussian model in terms of prediction accuracy at test sites if the covariate was considered, otherwise they were similar. Kriging was superior in terms of prediction accuracy at test sites. From the three covariates considered, namely household biofuel use for cooking, heating and housing informality, it was housing informality that was statistically significant. Housing informality coincides with household use of biofuels, especially for heating, and being located close to industrial areas.

The deterioration of air quality in urban areas as a result of fugitive dust was explored because of the presence of mines, mine residue deposits, unpaved roads and agricultural fields in the study region. The second objective was to determine if land cover could be statistically related to observed PM10 and be used as a covariate to improve the reliability of PM10 predictions at locations without air quality stations. In the absence of readily available land cover data, high resolution SPOT 6 images were obtained for land cover classification. An ensemble maximum likelihood pixel-based land cover classifier was developed with five primary classes, namely water, bare soil, vegetation, built-up and a mixed class for pixels where there was difficulty in separating bare soil and degraded grass. The ensemble classifier which was based on iterative training enabled inclusion of information on known sources of variability which contribute to difficulties in classifying bare soil in the study region. These sources of variability were mainly soil colour tones due to variation in soil types. Overall accuracy of the classifier in terms of the Kappa index was 0.78. Various landscape features affect the dispersion and sedimentation of dust particles and as such a statistical relationship between ambient concentrations of PM10 and a factor for land cover composition was sought. Firstly, a k-means cluster analysis was used to derive homogenous land cover groups in neighbourhoods (within 4 km radius) of air quality stations that could be related to observed PM10 concentrations. Secondly, average PM10 calculated for days where wind speeds were conducive for dust emissions were related with a factor for land cover composition group in a varying intercepts regression model, where the factor was found to be a significant covariate. Therefore, land cover data can be processed into a suitable covariate for improved prediction of PM10 at locations without air quality monitoring stations in spatially sparse networks.

High quality monitoring data are important, but data from air quality stations often suffer from substantial incompleteness. In sparse monitoring networks imputation of missing pollutant data is favourable compared to discarding a station’s record or analyzing data of low coverage. With multiple imputation missing values are imputed and the uncertainty associated with the imputations is quantifiable. The third objective was to develop a bootstrap regression multiple method to multiply impute missing meteorological and pollutant values for each air quality station. The method leverages on the availability of better quality meteorological data from nearby weather stations to impute multiple values for each missing relative humidity, temperature, wind speed and direction value. Subsequently, NO2, SO2 and eventually PM10 and PM2.5 were imputed based on the completed meteorological datasets. Regression imputation models were customized for each variable, such as circular regression for wind direction and log-transformation with inclusion of a wind intensity indicator for wind speed. Inference was based on generalized least squares to incorporate first order autoregressive residual structure to account for temporal autocorrelation. To avoid over- stating the precision, regressions were performed sequentially and at each stage imputations were drawn from the Gaussian predictive distribution parameterized by the deterministic predicted value and the prediction standard error, thus incorporating uncertainty into the imputed values including errors propagating from meteorological imputations to imputed pollutant values. Using meteorological, gaseous pollutant and seasonal factor variables as covariates resulted in the preservation of seasonal patterns in imputed data. When the bootstrap regression imputation method was compared with the approximate Bayesian bootstrap (ABB) method, ABB imputations reverted to the mean, had reduced variability and lacked seasonal structure. Overall, the bootstrap regression multiple imputation method resulted in improved imputation quality of pollutants and meteorological variables compared to the ABB.

The fourth objective was to map the risk of exposure to poor air quality, integrating hazard probabilities with indicators of population at risk and inability of exposed communities to cope with the adverse effects of poor air quality. Hazard probabilities were defined as annual average concentrations of PM2.5 and PM10 exceeding specific regulatory standards, namely 25 µg m−3 for PM2.5 and 50 µg m−3 for PM10. They were obtained using conditional simulations based on spatiotemporal kriging with external drift. Covariates and a joint spatiotemporal covariance function solved the problem of spatiotemporal sparsity of air quality data. Covariates included land cover composition and population counts to account location specific properties of pollutant emissions and dispersion. Exceedance probabilities for PM10 and PM2.5 were high in central and southern parts of Gauteng and the main towns in the Highveld priority air-shed in Mpumalanga. A composite spatial indicator for social vulnerability was developed using geographically weighted principal components analysis. High social vulnerability was indicated for the south-eastern parts of Mpumalanga, characterized by the prevalence of child-headed households, insufficiency of access to basic services such as piped water for residential use and routine waste collection. Areas marked by moderate to high social vulnerability in Gauteng were characterized by the prevalence of housing informality, female household leadership (mostly single income households) and immigration. Combining the three risk dimensions resulted in high risk of exposure to excessive ambient PM10 concentration throughout Gauteng and Mpumalanga. For PM2.5, small areas with low to medium risk of exposure to excessive ambient concentrations occurred away from the major towns of Mpumalanga and in protected areas towards the periphery of Gauteng. In Gauteng, PM2.5 risk was highest in the city region. These retrospective risk maps can be used to initiate detailed investigations into the human and housing conditions in high risk areas for confirmation to inform mitigation efforts.

To summarize, this dissertation provides a framework for air quality risk mapping, contributing specific methods to improve the quality of the data including the integration of ancillary data from disparate sources.


Sibusisiwe Khuluse was born on 19 August 1985 in Durban, South Africa. She obtained a BSc degree in Applied Mathematics and Statistics and a BSc Honours degree in Statistics at the University of Kwa-Zulu-Natal, Durban, South Africa in 2007. In 2007 she was employed as a Candidate Researcher in Statistics at the Council for Scientific and Industrial Research (CSIR), in Pretoria, South Africa. In 2009 she spent three months at the Faculty of Geo-Information Science and Earth Observation (ITC) at the University of Twente for her MSc research supported by the Tata Africa Schlarship, ITC and CSIR. She graduated with an MSc in Mathematical Statistics degree of the University of Witwatersrand, Johannesburg, in 2001 whilst in full-time employment at the CSIR.

She was awarded the Harvard South Africa Fellowship in 2009 and went to Harvard University Graduate School of the Arts and Sciences as a visiting student from August 2010 until May 2011. In July 2011 she returned to ITC for her PhD research supported by the Nuffic Netherlands Fellowhip Programme and the CSIR. Her PhD research focused on the development of statistical framework for air quality risk mapping, contributing specific methods to improve the quality of the data including the integration of ancillary data from disparate sources. This thesis is the output of her research.

Khuluse, S.A., Stein, A. (promoter) and Debba, P. (co-promoter)  (2017) Spatial statistical modelling of urban particulate matter. Enschede, University of Twente Faculty of Geo-Information and Earth Observation (ITC), 2017. ITC Dissertation 305, ISBN: 978-90-365-4370-5.

Full text



Event starts: Friday 21 July 2017 at 14:30
Venue: UT, Waaier 4
City where event takes place: Enschede

previous page
more events

  1. Home »
  2. Organisation »