Bus Dwell Time and Travel Time Modelling Using Data Mining Methods

Show simple item record

dc.contributor.advisor Ranjitkar, P en
dc.contributor.advisor Wilson, D en
dc.contributor.author Rashidi, Soroush en
dc.date.accessioned 2015-02-26T00:39:33Z en
dc.date.issued 2014 en
dc.identifier.citation 2014 en
dc.identifier.uri http://hdl.handle.net/2292/24671 en
dc.description.abstract Intelligent transportation systems (ITS) provide innovative, sustainable, and cost effective solutions to transport problems using existing road infrastructure. The Advanced Traveller Information System (ATIS) is an important component of ITS aimed at providing accurate travel-related information to improve road users’ confidence in the transportation system. Major advances have been made in the last few decades in data collection and data mining methods; however, some of these new methods are yet to be assessed for modelling and estimation of bus dwell time (BDT) and bus travel time (BTT). In the literature, less attention has been paid to dealing with abnormality in datasets, which can greatly influence model accuracy. The primary goal of this thesis is to provide a comprehensive assessment of several statistical and data mining methods to model BDT and BTT based on data collected manually and using an automatic vehicle location (AVL) system. The methods assessed in this study can be broadly grouped into two categories, namely white box and black box models. The white box models are used to predict and explain variations in BDT and BTT, which include multiple linear regression (MLR), gene expression programing (GEP), classification and regression tree (CART), and chi-squared automatic interaction detector (CHAID). The black box models are used to improve the accuracy of BTT prediction models, which include random forest (RF), stochastic gradient boosting-Tree Boost (TB), and multilayer perceptron neural network (MLP). Moreover, four time series based methods are also assessed to model and estimate BDT using AVL data where the number of people boarding and alighting were not available. These include random walk (RW), exponential smoothing (ES), moving average (MA) and autoregressive integrated moving average (ARIMA). The successful implementation of data mining methods requires accurate data preparation and appropriate selection of independent variables. Several time- and location-based variables were employed in this study to develop BDT and BTT models. Abnormality in the collected data and missing observations are dealt with carefully using different methods to sift and impute them. Six imputation techniques are applied to replace missing observations in the AVL data including series mean, mean and median of nearby points, linear interpolation, regression, and expectation maximization (EM), which are then compared against the commonly used the listwise deleted data technique. New statistical methods are implemented to distinguish between served versus nonserved BDT observations and stopping versus non-stopping BDT duration. Among imputation methods for missing observations, EM performed better than others yielding the lowest RMSE value. The magnitude of error increases with an increase in the number of missing observations. The widely used traditional MLR model is cumbersome in the sense that it requires checking and satisfying some strict model assumptions. The MLR model failed to capture nonlinear variations in BTT data. GEP and decision tree based models, including CART and CHAID show potential to model and estimate BDT and BTT more accurately than the traditional MLR model without the need to satisfy strict model assumptions. Intersection delay, distance to next stop, and BDT are the most influential factors to model BTT. Another important contributor in BTT modelling is bus stop closeness as a measure of bunching and queuing at stops. The method of payment of bus fare (cash versus electronically) is the most important factor for BDT modelling for all proposed models. The number of boarding passengers paying bus fare in cash is less dominant, as the time taken by them is usually accommodated within the time taken by boarding passengers paying electronically (by card), who usually don’t have to wait for those paying in cash. Among black box models, RF and TB are capable of capturing the variability in BTT with acceptable generalization ability and are, therefore, more suitable for online prediction of BTT. However, MLP, CART and CHAID models witnessed poor generalizability. Among time series models, in overall ranking for the central business district (CBD) bus stops, MA outperformed other models with lowest MAPE values, while for Non-CBD bus stops, the ARIMA model performed better than other models. en
dc.publisher ResearchSpace@Auckland en
dc.relation.ispartof PhD Thesis - University of Auckland en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.title Bus Dwell Time and Travel Time Modelling Using Data Mining Methods en
dc.type Thesis en
thesis.degree.grantor The University of Auckland en
thesis.degree.level Doctoral en
thesis.degree.name PhD en
dc.rights.holder Copyright: The Author en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.elements-id 476923 en
pubs.record-created-at-source-date 2015-02-26 en
dc.identifier.wikidata Q112906803


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics