Show simple item record

dc.contributor.authorMcLearn, Gregen
dc.date.accessioned2006-08-22 14:24:34 (GMT)
dc.date.available2006-08-22 14:24:34 (GMT)
dc.date.issued2002en
dc.date.submitted2002en
dc.identifier.urihttp://hdl.handle.net/10012/1080
dc.description.abstractA web crawler provides an automated way to discover web events ? creation, deletion, or updates of web pages. Competition among web crawlers results in redundant crawling, wasted resources, and less-than-timely discovery of such events. This thesis presents a cooperative sharing crawler algorithm and sharing protocol. Without resorting to altruistic practices, competing (yet cooperative) web crawlers can mutually share discovered web events with one another to maintain a more accurate representation of the web than is currently achieved by traditional polling crawlers. The choice to share or merge is entirely up to an individual crawler: sharing is the act of allowing a crawler M to access another crawler's web-event data (call this crawler S), and merging occurs when crawler M requests web-event data from crawler S. Crawlers can choose to share with competing crawlers if it can help reduce contention between peers for resources associated with the act of crawling. Crawlers can choose to merge from competing peers if it helps them to maintain a more accurate representation of the web at less cost than directly polling web pages. Crawlers can control how often they choose to merge through the use of a parameter &#961;, which dictates the percentage of time spent either polling or merging with a peer. Depending on certain conditions, pathological behaviour can arise if polling or merging is the only form of data collection. Simulations of communities of simple cooperating web crawlers successfully show that a combination of polling and merging (0 < &#961; < 1) can allow an individual member of the cooperating community a higher degree of accuracy in their representation of the web as compared to a traditional polling crawler. Furthermore, if web crawlers are allowed to evaluate their own performance, they can dynamically switch between periods of polling and merging to still perform better than traditional crawlers. The mutual performance gain increases as more crawlers are added to the community.en
dc.formatapplication/pdfen
dc.format.extent470333 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2002, McLearn, Greg. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectautonomousen
dc.subjectweb crawlersen
dc.subjectWWWen
dc.subjectsimulationen
dc.titleAutonomous Cooperating Web Crawlersen
dc.typeMaster Thesisen
dc.pendingfalseen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages