Using a Credibility Classifier to Improve Health-Related Information Retrieval

Beylunioglu, Fuat Can

Using a Credibility Classifier to Improve Health-Related Information Retrieval

Files

Beylunioglu_FuatCan.pdf (616.13 KB)

Date

2020-08-19

Authors

Beylunioglu, Fuat Can

Advisor

Smucker, Mark
Duimering, P. Robert

Publisher

University of Waterloo

Abstract

In this thesis, we address improving the credibility and correctness of information retrieved by search engines in health-related searches. Health misinformation presented in the search engine results pages (SERPs) is a challenging problem to search engines whose successes have been measured with the number of URLs in the SERPs relevant to the user's query. However, research shows that relevant but inaccurate information can lead to wrong decisions, which is a challenge to the current search engines. Although existing studies have proposed different ways to help to make better health decisions, there is not much done in the information retrieval context. In our study, we proposed algorithmic methods to improve correct and credible information presented in the results pages. The algorithms are motivated by the hypothesis that credibility of a document correlates with its correctness. Therefore, we trained classifiers to predict the credibility of documents retrieved by a search engine and adjust their ranks based on the credibility and spaminess scores. To test the performances of the algorithms, we have conducted an experiment as a part of our participation in TREC Decision Track 2019. As we show in this study, we can significantly improve the baseline BM25 algorithm in credibility and correctness tasks. We also present an analysis of the credibility and correctness judgments produced for the track to give insight into the distribution of credibility and correct documents retrieved in health-related tasks. Our analysis suggests that credibility can help to reach accurate information when the underlying treatment is ineffective, but there is a limit to its contribution to users' search experience.