A Framework for Ensemble Predictive Modeling

Abdunabi, Tarek A. M.

A Framework for Ensemble Predictive Modeling

Files

Abdunabi_Tarek.pdf (5.11 MB)

Date

2016-05-13

Authors

Abdunabi, Tarek A. M.

Advisor

Basir, Otman

Publisher

University of Waterloo

Abstract

Ensemble systems have been successfully applied in many fields, such as finance, bioinformatics, medicine, cheminformatics, manufacturing, geography, information security, information retrieval, image retrieval, and recommender systems. The ultimate objective of an ensemble system is to produce better predictions by combining the approximations of different classifiers/models. However, the ensemble performance depends on three main design features. Firstly, the diversity/independence of the base models/classifiers. If all models/classifiers produce similar/correlated predictions, then combining those predictions will not provide any improvement. Diversity is considered to be a key design feature of any successful ensemble system. Secondly, the fusion topology, namely, the selection of a representative topology. Thirdly, the fusion function, namely, the selection of a suitable function. Accordingly, building an effective ensemble system is a complex and challenging process, which requires intuition and deep knowledge of the problem context, and a well-defined predictive modeling process. Although several taxonomies have been reported in the literature, which aim to categorize ensemble systems from the system's designer point of view, there are still important research gaps need to be addressed. First, a comprehensive framework for developing ensemble systems is not yet available. Second, several strategies have been proposed to inject model diversity in the ensemble; however, there is a shortage of empirical studies that compare the effectiveness of these strategies. Third, most of the ensemble systems research has concentrated on simple problems, and relatively small/low-dimensional data sets. Further experimental research is required to investigate the application of ensemble systems to large and/or high-dimensional data sets, with a variety of data types. This research attempts to fill these gaps. First, the thesis proposes a framework for ensemble predictive modeling. It coins the term "ensemble predictive modeling" to refer to the process of developing ensemble systems. Second, the thesis empirically compares several diversity injection strategies. Third, the thesis validates the proposed framework using two real-world, large/high-dimensional, regression and classification case studies. The empirical results indicate the effectiveness of the proposed framework.