Automating Protocol Analysis with Generative AI: Classifying Questions in Design Review Meetings Using GPT-4
Loading...
Date
2024-11-26
Authors
Advisor
Safayeni, Frank
Hurst, Ada
Hurst, Ada
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
This study investigates the extent to which GPT-4 can classify question utterances in verbal protocols of design according to Eris' (2004) taxonomy utilizing in-context learning (ICL). Specifically, it examines the impact of various factors on classification, including the size of the prompt data, contextual information inclusion, prompt engineering strategies, cross-dataset applicability, and cross-model evaluation. The findings of this research could pave the way for more widespread adoption of AI in design research, transforming how protocols are analyzed and interpreted, ultimately leading to more efficient and accurate insights into cognitive processes during design activities.
A series of experiments were conducted to evaluate the performance of GPT-4 in this context. The experiments involved utilizing ICL by providing the AI model on a dataset of human-labeled questions and testing its ability to classify new questions according to predefined categories. The findings indicate that GPT-4 performs reasonably well in categorizing stand-alone question utterances, achieving alignment with human-sourced labels in many cases. Moreover, GPT-4 also shows promising generalization capability across the datasets used in this study. The study also highlights the potential for using less expensive LLMs for similar tasks without a significant drop in performance, making them a more accessible option. The results also imply that classification accuracy depends on the dataset quality, indicating that performance may improve with a higher-quality dataset. However, the results also reveal that providing additional context does not always enhance, and in some cases even diminishes, the model's performance, highlighting the challenges of context-dependent classification tasks.
The implications of these findings suggest that while GPT-4 shows promise as a tool for automating protocol analysis, there are significant limitations that must be addressed to fully leverage its capabilities in design research. In conclusion, this research contributes to the growing body of knowledge on the application of AI in design research, proposing several directions for future research aimed at refining the use of artificial intelligence in qualitative analysis.
Description
Keywords
design research, protocol analysis, Artificial Intelligence, GPT-4, design cognition, LLM, qualitative research, text analytics, machine learning, NLP, Eris' taxonomy, design computing