A Comparison of Unsupervised Topic Modelling Techniques for Qualitative Data Analysis of Online Communities

Kaur, Amandeep

A Comparison of Unsupervised Topic Modelling Techniques for Qualitative Data Analysis of Online Communities

Files

Kaur_Amandeep.pdf (43.96 MB)

Date

2024-07-25

Authors

Kaur, Amandeep

Publisher

University of Waterloo

Abstract

Social media constitutes a rich and influential source of information for qualitative researchers. However, its vast volume and diversity present significant challenges, which can be assisted by computational techniques like topic modelling. But qualitative researchers often struggle to use computational techniques due to a lack of programming expertise and concerns about maintaining the nuanced aspects of their research, such as contextual understanding, subjective interpretations, and ethical considerations of their data. To address this issue, this thesis explores the integration of BERTopic, an advanced Large Language Model (LLM)-based method, into the Computational Thematic Analysis (CTA) Toolkit to support qualitative data analysis of social media. We conducted interviews and hands-on evaluations in which qualitative researchers compared topics from three modeling techniques --- LDA, NMF and BERTopic. Participants prioritized topic relevance, logical organization, and the capacity to reveal unexpected relationships within the data, valuing detailed, coherent clusters for deeper understanding and actionable insights. BERTopic was favored by 8/12 participants for its ability to uncover hidden connections. These findings underscore the transformative potential of LLM-based tools in providing deeper, more nuanced insights for qualitative analysis of social media data.

Keywords

human computer interaction, qualitative analysis, computational linguistics, large language models, social media analysis

URI

http://hdl.handle.net/10012/20741

Collections

Theses
Computer Science

Full item page

A Comparison of Unsupervised Topic Modelling Techniques for Qualitative Data Analysis of Online Communities

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

LC Keywords

Citation

URI

Collections