Show simple item record

dc.contributor.authorCormier, Michael 15:33:00 (GMT) 15:33:00 (GMT)
dc.description.abstractThis thesis is focused on the development of computer vision techniques for parsing web pages using an image of the rendered page as evidence, and on understanding this under-explored class of images from the perspective of computer vision. This project is divided into two tracks---applied and theoretical---which complement each other. Our practical motivation is the application of improved web page parsing to assistive technology, such as screenreaders for visually impaired users or the ability to declutter the presentation of a web page for those with cognitive deficit. From a more theoretical standpoint, images of rendered web pages have interesting properties from a computer vision perspective; in particular, low-level assumptions can be made in this domain, but the most important cues are often subtle and can be highly non-local. The parsing system developed in this thesis is a principled Bayesian segmentation-classification pipeline, using innovative techniques to produce valuable results in this challenging domain. The thesis includes both implementation and evaluation solutions. Segmentation of a web page is the problem of dividing it into semantically significant, visually coherent regions. We use a hierarchical segmentation method based on the detection of semantically significant lines (possibly broken lines) which divide regions. The Bayesian design allows sophisticated probability models to be applied to the segmentation process, and our method produces segmentation trees that achieve good performance on a variety of measures. Classification, for our purposes, is identifying the semantic role of regions in the segmentation tree of a page. We achieve promising results with a Bayesian classification algorithm based on the novel use of a hidden Markov tree model, in which the structure of the model is adapted to reflect the structure of the segmentation tree. This allows the algorithm to make effective use of the context in which regions appear as well as the features of each individual region. The methods used to evaluate our page parsing system include qualitative and quantitative evaluation of algorithm performance (using manually-prepared ground truth data) as well as a user study of an assistive interface based on our page segmentation algorithm. We also performed a separate user study to investigate users' perceptions of web page organization and to generate ground truth segmentations, leading to important insights about consistency. Taken as a whole, this thesis presents innovative work in computer vision which contributes both to addressing the problem of web accessibility and to the understanding of semantic cues in images.en
dc.publisherUniversity of Waterlooen
dc.titleComputer Vision on Web Pages: A Study of Man-Made Imagesen
dc.typeDoctoral Thesisen
dc.pendingfalse R. Cheriton School of Computer Scienceen Scienceen of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws.contributor.advisorCohen, Robin
uws.contributor.advisorMann, Richard
uws.contributor.affiliation1Faculty of Mathematicsen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages