Cross-Domain Sentence Modeling for Relevance Transfer with BERT

Akkalyoncu Yilmaz, Zeynep

Cross-Domain Sentence Modeling for Relevance Transfer with BERT

dc.contributor.advisor	Lin, Jimmy
dc.contributor.author	Akkalyoncu Yilmaz, Zeynep
dc.date.accessioned	2019-12-16T20:28:56Z
dc.date.available	2019-12-16T20:28:56Z
dc.date.issued	2019-12-16
dc.date.submitted	2019-12-06
dc.description.abstract	Standard bag-of-words term-matching techniques in document retrieval fail to exploit rich semantic information embedded in the document texts. One promising recent trend in facilitating context-aware semantic matching has been the development of massively pretrained deep transformer models, culminating in BERT as their most popular example today. In this work, we propose adapting BERT as a neural re-ranker for document retrieval to achieve large improvements on news articles. Two fundamental issues arise in applying BERT to ``ad hoc'' document retrieval on newswire collections: relevance judgments in existing test collections are provided only at the document level, and documents often exceed the length that BERT was designed to handle. To overcome these challenges, we compute and aggregate sentence-level evidence to rank documents. The lack of appropriate relevance judgments in test collections is addressed by leveraging sentence-level and passage-level relevance judgments fortuitously available in collections from other domains to capture cross-domain notions of relevance. Our experiments demonstrate that models of relevance can be transferred across domains. By leveraging semantic cues learned across various domains, we propose a model that achieves state-of-the-art results on three standard TREC newswire collections. We explore the effects of cross-domain relevance transfer, and trade-offs between using document and sentence scores for document ranking. We also present an end-to-end document retrieval system that integrates the open-source Anserini information retrieval toolkit, discussing the related technical challenges and design decisions.	en
dc.identifier.uri	http://hdl.handle.net/10012/15326
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	https://github.com/castorini/birch	en
dc.subject	information retrieval	en
dc.subject	natural language processing	en
dc.subject	deep learning	en
dc.title	Cross-Domain Sentence Modeling for Relevance Transfer with BERT	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Lin, Jimmy
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AkkalyoncuYilmaz_Zeynep.pdf
Size:: 1.76 MB
Format:: Adobe Portable Document Format
Description:: Thesis PDF

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.08 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science