Show simple item record

dc.contributor.authorAbouelnagah, Younes
dc.date.accessioned2013-08-07 16:06:30 (GMT)
dc.date.available2013-08-07 16:06:30 (GMT)
dc.date.issued2013-08-07T16:06:30Z
dc.date.submitted2013
dc.identifier.urihttp://hdl.handle.net/10012/7689
dc.description.abstractSearch and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectInformation Retrievalen
dc.subjectText Miningen
dc.subjectData Miningen
dc.subjectSocial Mediaen
dc.subjectTemporal summarizationen
dc.subjectFrequent itemset miningen
dc.subjectMicroblogsen
dc.subjectFrequent pattern miningen
dc.titleEfficient Temporal Synopsis of Social Media Streamsen
dc.typeMaster Thesisen
dc.pendingfalseen
dc.subject.programComputer Scienceen
uws-etd.degree.departmentSchool of Computer Scienceen
uws-etd.degreeMaster of Mathematicsen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages