Reducing Health Misinformation in Search Results

Zhang, Dake

Reducing Health Misinformation in Search Results

Files

Zhang_Dake.pdf (1.23 MB)

Date

2022-08-22

Authors

Zhang, Dake

Advisor

Smucker, Mark

Publisher

University of Waterloo

Abstract

People commonly search the web for answers to health-related questions. With health information being added to the Internet every day, misinformation proliferates and disseminates wildly. Previous work has shown that if health misinformation exists in search results, people can make incorrect decisions, which may cause negative effects on their lives. To reduce health misinformation in search results, we need to be able to find web documents that contain correct information and promote them to higher positions in search results over documents that contain misinformation. In this thesis, we describe our efforts in reducing health misinformation in search results. First, we describe our participation in the TREC 2021 Health Misinformation Track, which provides a framework for evaluating ranking approaches to reducing health misinformation in search results. This track uses the Compatibility Difference as the primary evaluation metric, which measures the approach's ability to rank correct and credible documents before incorrect and non-credible documents. In the 2021 track, runs that used the provided correct answers were viewed as manual runs. By making use of the known answers and applying a Stance Detection Model for reranking, our manual method achieved a Compatibility Difference score of 0.176, a dramatic improvement over the BM25 baseline with a score of -0.022. Second, as an extension of our work above, we present a pipeline to automatically derive correct answers by learning trustworthy web sources and then reduce health misinformation in search engine results. Determining the correct answer has been a difficult hurdle to overcome for participants in the TREC Health Misinformation Track. In the 2021 track, automatic runs were not allowed to use the known answer to a topic’s health question. By exploiting an existing set of health questions and corresponding known answers, we show it is possible to learn which web hosts are trustworthy, from which we can predict the correct answers to the 2021 health questions with an accuracy of 76%. Using our predicted answers, we can promote documents that we predict contain this answer and achieve a Compatibility Difference score of 0.129, achieving a three-fold performance increase compared with the previous best automatic method with a score of 0.043. To wrap up, evaluated on the TREC 2021 Health Misinformation Track, our final pipeline achieves new state-of-the-art performance among automatic runs.

URI

http://hdl.handle.net/10012/18602

Collections

Theses
Computer Science

Full item page

Reducing Health Misinformation in Search Results

Files

Date

Authors

Advisor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Subject Headings

Citation

URI

Collections