Use of Large Language Models (LLMs) in Qualitative Analysis: Evaluating LLMs as Assistive Coding Agents

dc.contributor.authorNeeb, Mikayla
dc.date.accessioned2026-02-19T14:12:50Z
dc.date.available2026-02-19T14:12:50Z
dc.date.issued2026-02-19
dc.date.submitted2026-01-16
dc.description.abstractIntroduction: Large language models (LLMs) are increasingly used to support qualitative research, yet robust methods to evaluate the quality of LLM-generated codes remain underdeveloped. Existing approaches often rely on comparisons to human ground truth or custom evaluative methods, limiting cross-study comparisons. This study examines whether LLMs can function as assistive qualitative coding agents and introduces the CReDS framework as a structured approach to evaluating LLM-generated codes without the need for a comparative codebase. Methods: Two social media datasets were employed as validation sets to systematically develop and test approaches for evaluating LLM-generated inductive codes. Codes were generated using GPT-4o-mini and assessed through an iterative evaluation process. Initial assessment relied on conventional quantitative similarity metrics (e.g., cosine similarity); however, limitations in capturing qualitative distinctions prompted the incorporation of structured human evaluation. This process led to the development of the CReDS framework, comprising Consistency, Relevance, Distinction and Specificity, as a more comprehensive evaluative method. Targeted exploratory analyses further examined evaluative performance under specific conditions, further investigating the evaluative methods explored in this study. Results: LLM-generated codes aligned closely with human codes across both datasets, with overall semantic match rates ranging from 74-83%. At the text level, 65-95% of inputs had at least one LLM-generated code judged appropriate by human reviewers. CReDS scores revealed strong alignment to human-generated codes, with strong overlap across all dimensions. However, LLM-generated codes showed reduced specificity, and the CReDS framework observed conservative scoring behaviour. Despite these limitations, CReDS effectively surfaced systematic strengths and weaknesses in LLM outputs. Conclusions: These findings indicate that LLMs can reliably support early state qualitative coding when used as assistive tools under human oversight. The CReDS framework offers a transparent and scalable method for evaluating LLM-generated codes that align with qualitative principles while supporting iterative model development. This study contributes to a measurable and scalable platform for responsible human-AI collaboration in qualitative analysis and highlights directions for refining evaluation frameworks in future work.
dc.identifier.urihttps://hdl.handle.net/10012/22945
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectqualitative research
dc.subjectqualitative coding
dc.subjectevaluation methodology
dc.subjecthuman-AI collaboration
dc.subjectqualitative analysis
dc.subjectlarge language models (LLMs)
dc.titleUse of Large Language Models (LLMs) in Qualitative Analysis: Evaluating LLMs as Assistive Coding Agents
dc.typeMaster Thesis
uws-etd.degreeMaster of Science
uws-etd.degree.departmentSchool of Public Health Sciences
uws-etd.degree.disciplinePublic Health Sciences
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorChen, Helen
uws.contributor.affiliation1Faculty of Health
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Neeb_Mikayla.pdf
Size:
2.37 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections