Browsing by Author "Grossman, Maura"

Now showing 1 - 7 of 7

Assessment of AI-Generated Pediatric Rehabilitation SOAP-Note Quality
(University of Waterloo, 2025-02-19) Amenyo, Solomon; Grossman, Maura; Brown, Daniel; Wylie-Toal, Brendan
This study explores the integration of artificial intelligence (AI) or large language models (LLMs) into pediatric rehabilitation clinical documentation, focusing on the generation of SOAP (Subjective, Objective, Assessment, Plan) notes, which are essential for patient care. Creating complex documentation is time-consuming in pediatric settings. We evaluate the effectiveness of two AI tools; Copilot, a commercial LLM, and KAUWbot, a fine-tuned LLM developed for KidsAbility Centre for Child Development (an Ontario pediatric rehabilitation facility), in simplifying and automating this process. We focus on two key questions: (i) How does the quality of AI-generated SOAP notes based on short clinician summaries compare to human-authored notes, and (ii) To what extent is human editing necessary for improving AI-generated SOAP notes? We found no evidence of prior work assessing the quality of AI-generated clinical notes in pediatric rehabilitation. We used a sample of 432 SOAP notes, evenly divided among human-authored, Copilot-generated, and KAUWbot-generated notes. We employ a blind evaluation by experienced clinicians based on a custom rubric. Statistical analysis is conducted to assess the quality of the notes and the impact of human editing. The results suggest that AI tools such as KAUWbot and Copilot can generate SOAP notes with quality comparable to those authored by humans. We highlight the potential for combining AI with human expertise to enhance clinical documentation and offer insights for the future integration of AI into pediatric rehabilitation practice and other settings for the management of clinical conditions.
Determining the Effectiveness of Multi-user, Hybrid, Human-Computer Assessments for High Recall Information Retrieval Systems
(University of Waterloo, 2022-08-23) Alagappan, Solaiappan; Grossman, Maura
Electronic Discovery (eDiscovery), a use-case of High-Recall Information Retrieval (HRIR), seeks to obtain substantially all and only the relevant documents responsive to a request for production in litigation. Applications of HRIR typically use a human as their oracle to determine the relevance for a large number of documents, which is expensive both in terms of time/effort and cost. HRIR experts suggest that Continuous Active Learning (CAL) systems, the state-of-the-art information retrieval (IR) tools used for eDiscovery have the potential to achieve superior results and achieving them is limited primarily by the fallibility of the accuracy of human relevance assessments. In this research, we seek to understand the impact of the error rate in human relevance feedback on CAL systems and attempt to address them using six distinct multi-user– based, hybrid, human-computer assessment strategies. In contrast to the widely used single-user-based, hybrid, human-computer assessment strategy, these multi-user strategies re-provision resources to re-reviewing documents that the user may have misjudged, rather than examining more documents, in the pursuit of mitigating human relevance feedback error, while also achieving a high-recall and high-precision review. Within the constraints of a specified review budget, we want to determine which review strategy has the best chance of precisely retrieving more relevant documents. Our results show that leveraging a multi-user review strategy that “efficiently” uses three reviewers to review documents (CAL QC–Type 1) and a multi-user review strategy that uses the CAL system as one of the users in a three-reviewer approach (CAL QC–Type 2) can enable the end-to-end CAL system to achieve a significantly higher recall and higher precision when compared to that achieved by a single-user-based review strategy while employing the same review budget. This research provides evidence that CAL systems have the potential to better accommodate the needs of the HRIR applications by incorporating multi-user review strategies.
Determining the Utility of Key-term Highlighting for High Recall Information Retrieval Systems
(University of Waterloo, 2021-09-28) Wang, Xue Jun; Grossman, Maura
High-recall information retrieval (HRIR) is an important tool used in tasks such as electronic discovery ("eDiscovery") and systematic review of medical research. Applications of HRIR often uses a human as its oracle to determine the relevance of immense numbers of documents, which is expensive in both time and money. Various methods for reducing the amount of time spent per assessment and improving the quality of assessors have been proposed to improve these systems. For this thesis, we examine the method of presenting documents where key-terms are highlighted in place of plain-text document. This is commonly accepted as a positive feature which achieves both of the previously mentioned improvements, but there is currently a lack of empirical evidence to support its effectiveness. We describe an user study in which participants are assigned to one of two variations of a HRIR system (key-term highlighting vs plain-text) with a post task questionnaire. Our results failed to show statistically significant improvement for labelling documents with key-term highlighting over plain-text for any of the measures recall, precision, and F1, but may negatively affect retention of concepts. Our study provides empirical evidence for how the use of key-term highlighting affects an assessor's abilities to label documents and provides insight into when including this feature may be harmful rather than helpful.
Discovering Play Store Reviews Related to Specific Android App Issues
(University of Waterloo, 2018-09-20) Ghosh, Angshuman; Nagappan, Meiyappan; Grossman, Maura
Mobile App reviews may contain information relevant to developers. Developers can investigate these reviews to see what users of their apps are complaining about. However, the huge volume of incoming reviews is impractical to analyze manually. Existing research that attempts to extract this information suffers from two major issues: supervised machine learning methods are usually pre-trained, and thus, does not provide the developers the freedom to define the app issue they are interested in, whereas unsupervised methods do not guarantee that a particular app issue topic will be discovered. In this thesis, we attempt to devise a framework that would allow developers to define topics related to app issues at any time, and with minimal effort, discover as many reviews related to the issue as possible. Scalable Continuous Active Learning (S-CAL) is an algorithm that can be used to quickly train a model to retrieve documents with high recall. First, we investigate whether S-CAL can be used as a tool for training models to retrieve reviews about a specific app issue. We also investigate whether a model trained to retrieve reviews about a specific issue for one app can be used to do the same for a separate app facing the same issue. We further investigate transfer learning methods to improve retrieval performance for the separate apps. Through a series of experiments, we show that S-CAL can be used to quickly train models that can to retrieve reviews about a particular issue. We show that developers can discover relevant information during the process of training the model and that the information discovered is more than the information that can be discovered using keyword search under similar time restrictions. Then, we show that models trained using S-CAL can indeed be reused for retrieving reviews for a separate app and that performing additional training using transfer learning protocols can improve performance for models that performed below expectation. Finally, we compare the performance of the models trained by S-CAL at retrieving reviews for a separate app against that of two state-of-the-art app review analysis methods one of which uses supervised learning, while the other uses unsupervised learning. We show that at the task of retrieving relevant reviews about a particular topic, models trained by S-CAL consistently outperform existing state-of-the-art methods.
On Classifying the outcomes of Legal Motions
(University of Waterloo, 2024-09-23) Cardoso, Oluwaseun; Grossman, Maura
Conflict is inherent to the human condition, and socially acceptable methods of resolving conflict typically begin with dialogue, compromise, or negotiation. When these efforts fail, the legal process, often culminating in the courtroom, becomes the final recourse. Legal practitioners strive to position themselves advantageously by predicting the outcomes of legal disputes, increasingly relying on predictive tools to navigate the complexities of the courtroom. This thesis investigates the feasibility of predicting the outcomes of legal motion disputes using supervised machine learning methods. While previous research has predominantly utilized expertly hand-crafted features for judicial predictions, this study explores the use of written arguments, known as briefs, as the only basis for prediction. We trained 36 classifiers to predict the outcomes of legal motions and compared their performance to that of a baseline model. The best-performing classifier achieved an accuracy of 62\% on the test dataset. However, statistical analysis reveals that the performance of the top 10 classifiers is not statistically different from the baseline model. These findings suggest that, among the top-performing classifiers, there is no conclusively dominant approach for predicting legal motion outcomes using briefs. The thesis also offers theoretical considerations to explain these results.
Standards for the control of algorithmic bias in the Canadian administrative context
(University of Waterloo, 2022-08-30) Heisler, Natalie; Macfarlane, Emmett; Grossman, Maura
Governments around the world use machine learning in automated decision-making systems for a broad range of functions, including the administration and delivery of healthcare services, education, housing benefits; for surveillance; and, within policing and criminal justice systems. Algorithmic bias in machine learning can result in automated decisions that produce disparate impact, compromising Charter guarantees of substantive equality. The regulatory landscape for automated decision-making, in Canada and across the world, is far from settled. Legislative and policy models are emerging, and the role of standards is evolving to support regulatory objectives. This thesis seeks to answer the question: what standards should be applied to machine learning to mitigate disparate impact in automated decision-making? While acknowledging the contributions of leading standards development organizations, I argue that the rationale for standards must come from the law, and that implementing such standards would help not only to reduce future complaints, but more importantly would proactively enable human rights protections for those subject to automated decision-making. Drawing from the principles of administrative law, and the Supreme Court of Canada’s substantive equality decision in Fraser v. Canada (Attorney General), this research derives a proposed standards framework that includes: standards to mitigate the creation of biased predictions; standards for the evaluation of predictions; and, standards for the measurement of disparity in predictions. Recommendations are provided for implementing the proposed standards framework in the context of Canada’s Directive on Automated Decision-Making.
User-specific explanations of AI systems attuned to psychological profiles: a user study
(University of Waterloo, 2023-05-24) Chambers, Owen; Cohen, Robin; Grossman, Maura
In this thesis, we design a model aimed at supporting user-specific explanations from AI systems and present the results of a user study conducted to determine whether the algorithms used to attune the output to the user match well with the user's own preferences. This is achieved through a dedicated study of certain elements of a user model: levels of neuroticism and extroversion and degree of anxiety towards AI. Our work provides insights into how to test AI theories of explainability with real users, including questionnaires to administer and hypotheses to pose. We also shed some light on the value of a model for generating explanations that reasons about different degrees of and modes of explanation. We conclude with commentary about the continued merit of integrating user modeling into the development of AI explanation solutions, and the challenges, with next steps, to balance the design of theoretical models with the use of empirical evaluation, within the research conducted in the field.