UWSpace is currently experiencing technical difficulties resulting from its recent migration to a new version of its software. These technical issues are not affecting the submission and browse features of the site. UWaterloo community members may continue submitting items to UWSpace. We apologize for the inconvenience, and are actively working to resolve these technical issues.
 

Autonomous Cooperating Web Crawlers

dc.contributor.authorMcLearn, Gregen
dc.date.accessioned2006-08-22T14:24:34Z
dc.date.available2006-08-22T14:24:34Z
dc.date.issued2002en
dc.date.submitted2002en
dc.description.abstractA web crawler provides an automated way to discover web events ? creation, deletion, or updates of web pages. Competition among web crawlers results in redundant crawling, wasted resources, and less-than-timely discovery of such events. This thesis presents a cooperative sharing crawler algorithm and sharing protocol. Without resorting to altruistic practices, competing (yet cooperative) web crawlers can mutually share discovered web events with one another to maintain a more accurate representation of the web than is currently achieved by traditional polling crawlers. The choice to share or merge is entirely up to an individual crawler: sharing is the act of allowing a crawler M to access another crawler's web-event data (call this crawler S), and merging occurs when crawler M requests web-event data from crawler S. Crawlers can choose to share with competing crawlers if it can help reduce contention between peers for resources associated with the act of crawling. Crawlers can choose to merge from competing peers if it helps them to maintain a more accurate representation of the web at less cost than directly polling web pages. Crawlers can control how often they choose to merge through the use of a parameter &#961;, which dictates the percentage of time spent either polling or merging with a peer. Depending on certain conditions, pathological behaviour can arise if polling or merging is the only form of data collection. Simulations of communities of simple cooperating web crawlers successfully show that a combination of polling and merging (0 < &#961; < 1) can allow an individual member of the cooperating community a higher degree of accuracy in their representation of the web as compared to a traditional polling crawler. Furthermore, if web crawlers are allowed to evaluate their own performance, they can dynamically switch between periods of polling and merging to still perform better than traditional crawlers. The mutual performance gain increases as more crawlers are added to the community.en
dc.formatapplication/pdfen
dc.format.extent470333 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10012/1080
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2002, McLearn, Greg. All rights reserved.en
dc.subjectComputer Scienceen
dc.subjectautonomousen
dc.subjectweb crawlersen
dc.subjectWWWen
dc.subjectsimulationen
dc.titleAutonomous Cooperating Web Crawlersen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentSchool of Computer Scienceen
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
glmclearn2002.pdf
Size:
459.31 KB
Format:
Adobe Portable Document Format