Finding Cost-Efficient Decision Trees

dc.contributor.authorDufour, David
dc.date.accessioned2014-08-06T17:18:30Z
dc.date.available2014-08-06T17:18:30Z
dc.date.issued2014-08-06
dc.date.submitted2014-07-31
dc.description.abstractDecision trees have been a popular machine learning technique for some time. Labelled data, examples each with a vector of values in a feature space, are used to create a structure that can assign a class to unseen examples with their own vector of values. Decision trees are simple to construct, easy to understand on viewing, and have many desirable properties such as resistance to errors and noise in real world data. Decision trees can be extended to include costs associated with each test, allowing a preference over the feature space. The problem of minimizing the expected-cost of a decision tree is known to be NP-complete. As a result, most approaches to decision tree induction rely on a heuristic. This thesis extends the methods used in past research to look for decision trees with a smaller expected-cost than those found using a simple heuristic. In contrast to the past research which found smaller decision trees using exact approaches, I find that exact approaches in general do not find lower expected-cost decision trees than heuristic approaches. It is the work of this thesis to show that the success of past research on the simpler problem of minimizing decision tree size is partially dependent on the conversion of the data to binary form. This conversion uses the values of the attributes as binary tests instead of the attributes themselves when constructing the decision tree. The effect of converting data to binary form is examined in detail and across multiple measures of data to show the extent of this effect and to reiterate the effect is mostly on the number of leaves in the decision tree.en
dc.identifier.urihttp://hdl.handle.net/10012/8606
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectDecision treeen
dc.subjectMachine Learningen
dc.subjectHealth Informaticsen
dc.subjectConstraint Programmingen
dc.subject.programComputer Scienceen
dc.titleFinding Cost-Efficient Decision Treesen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dufour_David.pdf
Size:
2.46 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.67 KB
Format:
Item-specific license agreed upon to submission
Description: