Show simple item record

dc.contributor.authorRoegiest, Adam
dc.date.accessioned2012-04-24 18:28:51 (GMT)
dc.date.available2012-04-24 18:28:51 (GMT)
dc.date.issued2012-04-24T18:28:51Z
dc.date.submitted2012
dc.identifier.urihttp://hdl.handle.net/10012/6633
dc.description.abstractMicroblogging is an increasingly popular form of social media. One of the most popular microblogging services is Twitter. The number of messages posted to Twitter on a daily basis is extremely large. Accordingly, it becomes hard for users to sort through these messages and find ones that interest them. Twitter offers search mechanisms but they are relatively simple and accordingly the results can be lacklustre. Through participation in the 2011 Text Retrieval Conference's Microblog Track, this thesis examines real-time ad hoc search using standard information retrieval approaches without microblog or Twitter specific modifications. It was found that using pseudo-relevance feedback based upon a language model derived from Twitter posts, called tweets, in conjunction with standard ranking methods is able to perform competitively with advanced retrieval systems as well as microblog and Twitter specific retrieval systems. Furthermore, possible modifications both Twitter specific and otherwise are discussed that would potentially increase retrieval performance. Twitter has also spawned an interesting phenomenon called hashtags. Hashtags are used by Twitter users to denote that their message belongs to a particular topic or conversation. Unfortunately, tweets have a 140 characters limit and accordingly all relevant hashtags cannot always be present in tweet. Thus, Twitter users cannot easily find tweets that do not contain hashtags they are interested in but should contain them. This problem is investigated in this thesis in three ways using learning methods. First, learning methods are used to determine if it is possible to discriminate between two topically different sets of a tweets. This thesis then investigates whether or not it is possible for tweets without a particular hashtag, but discusses the same topic as the hashtag, to be separated from random tweets. This case mimics the real world scenario of users having to sift through random tweets to find tweets that are related to a topic they are interested in. This investigation is performed by removing hashtags from tweets and attempting to distinguish those tweets from random tweets. Finally, this thesis investigates whether or not topically similar tweets can also be distinguished based upon a sub-topic. This was investigated in almost an identical manner to the second case. This thesis finds that topically distinct tweets can be distinguished but more importantly that standard learning methods are able to determine that a tweet with a hashtag removed should have that hashtag. In addition, this hashtag reconstruction can be performed well with very few examples of what a tweet with and without the particular hashtag should look like. This provides evidence that it may be possible to separate tweets a user may be interested from random tweets only using hashtags they are interested in. Furthermore, the success of the hashtag reconstruction also provides evidence that users do not misuse or abuse hashtags since hashtag presence was taken to be the ground truth in all experiments. Finally, the applicability of the hashtag reconstruction results to the TREC Microblog Track and a mobile application is presented.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectinformation retrievalen
dc.subjectmicroblogen
dc.subjectclassificationen
dc.subjectreal-time ad hoc searchen
dc.subjecthashtagsen
dc.titleFinding Microblog Posts of User Interesten
dc.typeMaster Thesisen
dc.pendingfalseen
dc.subject.programComputer Scienceen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages