Reliable data from low-cost sensor networks
Reference
Degree Grantor
Abstract
A challenge in air quality monitoring is adequately determining local-scale spatiotemporal concentration variability. Air quality monitoring networks are traditionally limited in spatial resolution by the cost and ongoing maintenance requirements. Recent developments in low-cost sensors have presented new opportunities for high spatiotemporal air quality networks. Devices are low-cost, can be portable, and may be used to capitalise on “Citizen Science” initiatives. However, whilst devices offer the potential to increase the spatiotemporal resolution of air quality networks, a trade-off is introduced, that between quality versus quantity of data collected. Significant uncertainties exist regarding whether data from low-cost devices are reliable, precise and accurate enough to be useful. Further, given that it is not feasible to calibrate large numbers of devices using traditional protocols, there is a question as to whether more cost-effective options exist. Finally, the value of low-cost networks in identifying local-scale variability not observed in traditional networks is yet to be confidently asserted. This thesis sets out to address these research gaps by exploiting network correlations to develop automated data management frameworks, to evaluate the importance of local siting impacts on sensor data, and to seek fresh insights about local-scale air quality processes. Two ambient gas pollutants, ground-level ozone and nitrogen dioxide, are examined. Results show that network correlations among measurement locations can be used to determine data reliability. The concept of a ‘proxy’ is introduced, that is, a reliable signal from an independent source with similar statistical relationships to the site under examination. The choice of proxy is critical. Surrounding land use similarity is successfully used for selecting suitable proxies. Low variability between devices mounted at the same site, along with similarities to nearby regulatory stations, supports the ability of sensor data from a typical citizen science location to measure reliable, network-relevant information. Empirical methods for data validation and local calibration are presented that do not need large training datasets and allow real-time analysis. Starting with a clear definition of a low-cost network purpose – here to extend the spatial coverage of a regulatory network and provide real-time, reliable information on local-scale air quality – it is shown how to construct appropriate network management strategies and data validation tools. These methods allow a small regulatory station network to verify data from a larger network of low-cost devices. High-density networks in this work record processes not evident in current networks. Land use regression (LUR) and rank correlation models are illustrated as means to explore local-scale spatiotemporal variation. Rank correlation complemented LUR results in identifying similar significant urban variables, here related to traffic, atmospheric chemistry and urban built environment variables. Work presented in this thesis addresses some current challenges in the changing air quality measurement “paradigm”. Low-cost gas sensor value is demonstrated using data from networks in two cities: Auckland, NZ, and Vancouver, BC. There are a number of relevant and useful results: low-cost sensor data can use relationships between measurement locations in the network to verify data reliability; smart data handling protocols can limit costs; dense networks can identify local-scale processes not observable in current networks.