An Automated Quality Assurance Procedure for Archived Transit Data from APC and AVL Systems

Saavedra, Marian Ruth

An Automated Quality Assurance Procedure for Archived Transit Data from APC and AVL Systems

Files

Saavedra_Marian.pdf (4.62 MB)

Date

2010-08-30T20:21:33Z

Authors

Saavedra, Marian Ruth

Publisher

University of Waterloo

Abstract

Automatic Vehicle Location (AVL) and Automatic Passenger Counting (APC) systems can be powerful tools for transit agencies to archive large, detailed quantities of transit operations data. Managing data quality is an important first step for exploiting these rich datasets. This thesis presents an automated quality assurance (QA) methodology that identifies unreliable archived AVL/APC data. The approach is based on expected travel and passenger activity patterns derived from the data. It is assumed that standard passenger balancing and schedule matching algorithms are applied to the raw AVL/APC data along with any existing automatic validation programs. The proposed QA methodology is intended to provide transit agencies with a supplementary tool to manage data quality that complements, but does not replace, conventional processing routines (that can be vendor-specific and less transparent). The proposed QA methodology endeavours to flag invalid data as “suspect” and valid data as “non-suspect”. There are three stages: i) the first stage screens data that demonstrate a violation of physical constraints; ii) the second stage looks for data that represent outliers; and iii) the third stage evaluates whether the outlier data can be accounted for with valid or invalid pattern. Stop-level tests are mathematically defined for each stage; however data is filtered at the trip-level. Data that do not violate any physical constraints and do not represent any outliers are considered valid trip data. Outlier trips that may be accounted for with a valid outlier pattern are also considered valid. The remaining trip data is considered suspect. The methodology is applied to a sample set of AVL/APC data from Grand River Transit in the Region of Waterloo, Ontario, Canada. The sample data consist of 4-month’s data from September to December of 2008; it is comprised of 612,000 stop-level records representing 25,012 trips. The results show 14% of the trip-level data is flagged as suspect for the sample dataset. The output is further dissected by: reviewing which tests most contribute to the set of suspect trips; confirming the pattern assumptions for the valid outlier cases; and comparing the sample data by various traits before and after the QA methodology is applied. The latter task is meant to recognize characteristics that may contribute to higher or lower quality data. Analysis shows that the largest portion of suspect trips, for this sample set, suggests the need for improved passenger balancing algorithms or greater accuracy of the APC equipment. The assumptions for valid outlier case patterns were confirmed to be reasonable. It was found that poor schedule data contributes to poorer quality in AVL-APC data. An examination of data distribution by vehicle showed that usage and the portion of suspect data varied substantially between vehicles. This information can be useful in the development of maintenance plans and sampling plans (when combined with information of data distribution by route). A sensitivity analysis was conducted along with an impact analysis on downstream data uses. The model was found to be sensitive to three of the ten user-defined parameters. The impact of the QA procedure on network-level measures of performance (MOPs) was not found to be significant, however the impact was shown to be more substantial for route-specific MOPs.