Math Information Retrieval using a Text Search Engine

dc.contributor.authorDallas, Fraser
dc.date.accessioned2018-05-18T18:57:13Z
dc.date.available2018-05-18T18:57:13Z
dc.date.issued2018-05-18
dc.date.submitted2018-05-16
dc.description.abstractCombining text and mathematics when searching in a corpus with extensive mathematical notation remains an open problem. Recent results for math information retrieval systems on the math and text retrieval task at NTCIR-12, for example, show room for improvement, even though formula retrieval appears to be fairly successful. This thesis explores how to adapt the state-of-the-art BM25 text ranking method to work well when searching for math and text together. Symbol layout trees are used to represent math formulas, and features are extracted from the trees, which are then used as search terms for BM25. This thesis explores various features of symbol layout trees and explores their effects on retrieval performance. Based on the results, a set of features are recommended that can be used effectively in a conventional text-based retrieval engine. The feature set is validated using various NTCIR math only benchmarks. Various proximity measures show math and text are closer in documents deemed rel- evant than documents deemed non-relevant for NTCIR queries. Therefore it would seem that proximity could improve ranking for math information retrieval systems when search- ing for both math and text. Nevertheless, two attempts to include proximity when scoring matches were unsuccessful in improving retrieval effectiveness. Finally, the BM25 ranking of both math and text using the feature set designed for formula retrieval is validated by various NTCIR math and text benchmarks.en
dc.identifier.urihttp://hdl.handle.net/10012/13329
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectMathematics information retrievalen
dc.subjectMIRen
dc.subjectMathematical content representationen
dc.subjectMathMLen
dc.subjectOkapi BM25en
dc.subjectLuceneen
dc.titleMath Information Retrieval using a Text Search Engineen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorFrank, Tompa
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Fraser_Dallas.pdf
Size:
657.93 KB
Format:
Adobe Portable Document Format
Description:
Main Article - Removed Blank Page

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.08 KB
Format:
Item-specific license agreed upon to submission
Description: