Uncovering the Reliability and Consistency of AI Language Models: A Systematic Study

dc.contributor.authorKhatun, Aisha
dc.date.accessioned2024-08-22T14:12:00Z
dc.date.available2024-08-22T14:12:00Z
dc.date.issued2024-08-22
dc.date.submitted2024-08-14
dc.description.abstractLarge Language Models (LLMs) have rapidly advanced, becoming general-purpose assistants and creative partners. Despite their widespread use, LLMs exhibit significant vulnerabilities to prompt variations and struggle with task understanding, leading to inconsistencies and factual inaccuracies in their responses. Traditional Natural Language Processing (NLP) benchmarks often overlook nuances in LLM behavior and reliability. This thesis addresses this gap by curating a dataset across six categories: Fact, Conspiracy, Controversy, Misconception, Stereotype, and Fiction. We rigorously define LLMs' factual accuracy, consistency, and robustness to prompt variations using diverse response formats and question variations, and evaluate these on 37 models. Our findings reveal LLMs' volatility and unreliability, particularly in the Controversy and Misconception categories, where conflicting training data impedes performance. Additionally, we explore LLMs' ability to generate coherent fictional narratives, probing their ability to retain and effectively utilize factual information, a critical requirement for creative tasks like story generation. While LLMs offer versatile applications, their reliability hinges on addressing challenges in prompt understanding and response consistency, emphasizing the need for ongoing research to enhance their performance across diverse tasks and applications.
dc.identifier.urihttps://hdl.handle.net/10012/20847
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://borealisdata.ca/dataset.xhtml?persistentId=doi:10.5683/SP3/5MZWBV
dc.relation.urihttps://github.com/tanny411/llm-reliability-and-consistency-evaluation
dc.subjectLarge Language Model
dc.subjectComputational Creativity
dc.subjectStory Generation
dc.subjectConsistency
dc.subjectRobustness
dc.subjectLLM
dc.subjectAI
dc.subjectGPT 3
dc.subjectGPT 4
dc.subjectMCQ
dc.subjectMultiple Choice Question
dc.subjectDataset
dc.subjectFactual Accuracy
dc.subjectArtificial Intelligence
dc.subjectNLP
dc.subjectNatural Language Processing
dc.titleUncovering the Reliability and Consistency of AI Language Models: A Systematic Study
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorBrown, Dan
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
khatun_aisha.pdf
Size:
1.22 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: