Automating Protocol Analysis with Generative AI: Classifying Questions in Design Review Meetings Using GPT-4

Safayeni, FrankHurst, AdaSakib, Ahmed Shahriar2024-11-262024-11-262024-11-262024-11-19https://hdl.handle.net/10012/21198This study investigates the extent to which generative AI can classify question utterances in verbal protocols of design according to Eris' (2004) taxonomy utilizing in-context learning (ICL). Specifically, it examines the impact of various factors on classification, including the size of the prompt data, contextual information inclusion, prompt engineering strategies, cross-dataset applicability, and cross-model evaluation. The findings of this research could pave the way for more widespread adoption of AI in design research, transforming how protocols are analyzed and interpreted, ultimately leading to more efficient and accurate insights into cognitive processes during design activities. A series of experiments was conducted to evaluate the performance of GPT-4, a state-of-the-art generative AI model, in this context. The experiments involved utilizing ICL by providing the AI model on a dataset of human-labeled questions and testing its ability to classify new questions according to predefined categories. The findings indicate that GPT-4 performs reasonably well in categorizing stand-alone question utterances, achieving alignment with human-sourced labels in many cases. Moreover, GPT-4 also shows promising generalization capability across the datasets used in this study. The study also highlights the potential of using the less expensive proprietary LLM, Claude 3.5, for similar tasks without a significant drop in performance, making it a more accessible option. The results also imply that classification accuracy depends on the dataset quality, indicating that performance may improve with a higher-quality dataset. However, the results also reveal that providing additional context does not always enhance, and in some cases even diminishes, the model's performance, highlighting the challenges of context-dependent classification tasks. The implications of these findings suggest that while generative AI shows promise as a tool for automating protocol analysis, there are significant limitations that must be addressed to fully leverage its capabilities in design research. In conclusion, this research contributes to the growing body of knowledge on the application of AI in design research, proposing several directions for future research aimed at refining the use of artificial intelligence in qualitative analysis.endesign researchprotocol analysisArtificial IntelligenceGPT-4design cognitionLLMqualitative researchtext analyticsmachine learningNLPEris' taxonomydesign computingAutomating Protocol Analysis with Generative AI: Classifying Questions in Design Review Meetings Using GPT-4Master Thesis