Personal Email Spam Filtering with Minimal User Interaction

Mojdeh, Mona

Personal Email Spam Filtering with Minimal User Interaction

dc.comment.hidden	Hello, I am resubmitting my thesis. All the revision points have been corrected. This version that I am submitting now is compiled by latex to be printed double-sided, so some blank pages have been added. Here is the list of revision points and the new page number they match to: 1. The .pdf file name must appear as 'Mojdeh_Mona'. 2. Table of Contents - add an 'Appendix' title above the first Appendix entry ('A User Study Review...'). 3. Page 161 - remove the blank page OR remove the visible page number. New Page Number: 162 4. Pages 141, 142, 143 - all pages must include a minimum 1 inch margin at the top, bottom, and outer edge of each page New Page Numbers: 143,144,145 I appreciate if I get the review before the June convocation deadline since my parents are traveling from my home country to attend the ceremony. Thanks, Mona	en
dc.contributor.author	Mojdeh, Mona
dc.date.accessioned	2012-04-30T14:14:12Z
dc.date.available	2012-04-30T14:14:12Z
dc.date.issued	2012-04-30T14:14:12Z
dc.date.submitted	2012
dc.description.abstract	This thesis investigates ways to reduce or eliminate the necessity of user input to learning-based personal email spam filters. Personal spam filters have been shown in previous studies to yield superior effectiveness, at the cost of requiring extensive user training which may be burdensome or impossible. This work describes new approaches to solve the problem of building a personal spam filter that requires minimal user feedback. An initial study investigates how well a personal filter can learn from different sources of data, as opposed to user’s messages. Our initial studies show that inter-user training yields substantially inferior results to intra-user training using the best known methods. Moreover, contrary to previous literature, it is found that transfer learning degrades the performance of spam filters when the source of training and test sets belong to two different users or different times. We also adapt and modify a graph-based semi-supervising learning algorithm to build a filter that can classify an entire inbox trained on twenty or fewer user judgments. Our experiments show that this approach compares well with previous techniques when trained on as few as two training examples. We also present the toolkit we developed to perform privacy-preserving user studies on spam filters. This toolkit allows researchers to evaluate any spam filter that conforms to a standard interface defined by TREC, on real users’ email boxes. Researchers have access only to the TREC-style result file, and not to any content of a user’s email stream. To eliminate the necessity of feedback from the user, we build a personal autonomous filter that learns exclusively on the result of a global spam filter. Our laboratory experiments show that learning filters with no user input can substantially improve the results of open-source and industry-leading commercial filters that employ no user-specific training. We use our toolkit to validate the performance of the autonomous filter in a user study.	en
dc.identifier.uri	http://hdl.handle.net/10012/6675
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.subject	Spam Filtering	en
dc.subject	User Study	en
dc.subject.program	Computer Science	en
dc.title	Personal Email Spam Filtering with Minimal User Interaction	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	School of Computer Science	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mojdeh_Mona.pdf
Size:: 1.21 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 250 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science