Show simple item record

dc.contributor.authorDufour, David
dc.date.accessioned2014-08-06 17:18:30 (GMT)
dc.date.available2014-08-06 17:18:30 (GMT)
dc.date.issued2014-08-06
dc.date.submitted2014-07-31
dc.identifier.urihttp://hdl.handle.net/10012/8606
dc.description.abstractDecision trees have been a popular machine learning technique for some time. Labelled data, examples each with a vector of values in a feature space, are used to create a structure that can assign a class to unseen examples with their own vector of values. Decision trees are simple to construct, easy to understand on viewing, and have many desirable properties such as resistance to errors and noise in real world data. Decision trees can be extended to include costs associated with each test, allowing a preference over the feature space. The problem of minimizing the expected-cost of a decision tree is known to be NP-complete. As a result, most approaches to decision tree induction rely on a heuristic. This thesis extends the methods used in past research to look for decision trees with a smaller expected-cost than those found using a simple heuristic. In contrast to the past research which found smaller decision trees using exact approaches, I find that exact approaches in general do not find lower expected-cost decision trees than heuristic approaches. It is the work of this thesis to show that the success of past research on the simpler problem of minimizing decision tree size is partially dependent on the conversion of the data to binary form. This conversion uses the values of the attributes as binary tests instead of the attributes themselves when constructing the decision tree. The effect of converting data to binary form is examined in detail and across multiple measures of data to show the extent of this effect and to reiterate the effect is mostly on the number of leaves in the decision tree.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectDecision treeen
dc.subjectMachine Learningen
dc.subjectHealth Informaticsen
dc.subjectConstraint Programmingen
dc.titleFinding Cost-Efficient Decision Treesen
dc.typeMaster Thesisen
dc.pendingfalse
dc.subject.programComputer Scienceen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages