Multivariate Time Series Data Causal Discovery

Chang, Bo Yuan

Multivariate Time Series Data Causal Discovery

Files

Chang_BoYuan.pdf (1.66 MB)

Date

2021-10-05

Authors

Chang, Bo Yuan

Advisor

Zelek, John

Publisher

University of Waterloo

Abstract

One of the goals for Artificial Intelligence is to achieve human-like intelligence. To that end, several solutions were proposed over the decades, where causal structure discovery was proposed as a viable tool for enabling human-like reasoning. It can be treated as two stages, first causal discovery that examines the cause-effect relationships between variables, which are then used in the second stage, referred to as causal parameter inference, to perform causal inference using counterfactual/logic-like reasoning similar to how human beings approach a problem. Generally speaking, there are two types of causal discovery algorithms: those that work with random variables and those that work with time series data. The focus of this thesis will be on the latter. Performing causal studies on real world dataset is very challenging for time series data as it is prevalent to run into missing values. Currently, all existing causal algorithms require evenly-sampled time series data which unfortunately are not always available. In this thesis I proposed a systems that can address this difficulties that is hindering causal learning on real world datasets. The proposed system performs causal discovery using time series data with missing entries (i.e., sparsely sampled data at varying intervals). The solution put forward for this task is comprised of two parts: data filling with Gaussian Process Regression, and causal learning using a the traditional Vector Autoregressive Model or Machine Learning based approach. For the first part, experiments have shown that Gaussian Process Regression outperformed all the benchmark filling techniques such as K Nearest Neighbour regression, Parametric Linear filling as well as random variable filling. The obtained Root Mean Square Error for GPR filled was the smallest under across all filling percentages, comfortably beating benchmark algorithms by margins (RMSE difference varies from 0.05 to 1.5). As for the second part, an Echo State Network for causal learning is used due to its fast running time and higher prediction capabilities when compared with other causal learning algorithms available in the industry such as algorithms like Structural Expectation Maximization (SEM), and Subsampled Linear Auto-Regression Absolute Coefficients algorithm (SLARAC). When working with a 10 percent missing entries, the proposed system is capable of obtaining an MCC score of 0.31 on a -1 to +1 scale where +1 represents perfect prediction and -1 represents complete no usefulness of the result. The MCC score received from the proposed system significantly outperformed other methods such as SEM and SLARAC. To showcase the ability of the proposed system to adapt causal relationships on real world engineering applications, the experiment was conducted using a chemical refinery dataset called the Tennessee Eastman (TE) dataset.

Keywords

time series, data filling, Granger Causality, machine learning, causal inference

LC Subject Headings

Time-series analysis—Computer programs, Machine learning

URI

http://hdl.handle.net/10012/17625

Collections

Theses
Systems Design Engineering

Full item page

Multivariate Time Series Data Causal Discovery

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections