Itakura, Kalista Yuki
MetadataShow full item record
Traditional information retrieval applications, such as Web search, return atomic units of retrieval, which are generically called ``documents''. Depending on the application, a document may be a Web page, an email message, a journal article, or any similar object. In contrast to this traditional approach, focused retrieval helps users better pin-point their exact information needs by returning results at the sub-document level. These results may consist of predefined document components~---~such as pages, sections, and paragraphs~---~or they may consist of arbitrary passages, comprising any sub-string of a document. If a document is marked up with XML, a focused retrieval system might return individual XML elements or ranges of elements. This thesis proposes and evaluates a number of approaches to focused retrieval, including methods based on XML markup and methods based on arbitrary passages. It considers the best unit of retrieval, explores methods for efficient sub-document retrieval, and evaluates formulae for sub-document scoring. Focused retrieval is also considered in the specific context of the Wikipedia, where methods for automatic vandalism detection and automatic link generation are developed and evaluated.
Cite this work
Kalista Yuki Itakura (2010). Focused Retrieval. UWSpace. http://hdl.handle.net/10012/5645