Analysis of Limitations of AI Tools for Pediatric Speech Language Pathology Documentation and Mitigation Strategies

Tuinstra, Tia

Analysis of Limitations of AI Tools for Pediatric Speech Language Pathology Documentation and Mitigation Strategies

Files

Tuinstra_Tia.pdf (386.33 KB)

Date

2025-10-17

Authors

Tuinstra, Tia

Advisor

Tripp, Bryan

Publisher

University of Waterloo

Abstract

Speech Language Pathology (SLP) is a therapy discipline offered by KidsAbility, a pediatric rehabilitation clinic in Southern Ontario. Documentation is a key part of SLP and other therapy practice guidelines and can take up significant portions of a therapist’s time. AI-based clinical documentation aids have been developed to help reduce this burden, and one such tool - MutuoHealth’s AutoScribe - has been piloted by KidsAbility. Though this AI tool has been beneficial to some therapy disciplines, the SLP clinicians face unique challenges when using these tools. The model seemed unable to recognize speech therapy strategies or to parse the play-based script of pediatric appointments. This thesis seeks to explore the issues SLPs encounter with AI documentation tools and propose potential approaches to mitigate these issues. The AI documentation process was divided into the transcription pipeline, where an audio file input produced a corresponding transcript output, and the generation pipeline, where an input transcript produced a draft SOAP note. The SLPs who had participated in the AutoScribe pilot test were interviewed about their experiences with the tool and its integration into their workflows. The issues reported by the therapists were sorted into those more closely related to the transcript and those more closely related to the drafted SOAP note. A set of sample SLP appointments from KidsAbility were gathered from an extended AutoScribe pilot, with 10 selected as examples of appointment data (audio, transcripts, drafted and final SOAP notes) to test the transcription and generation pipelines. An augmented automatic speech recognition (ASR) pipeline based on a Whisper model was used to test improvements to the transcript. However, the generated transcripts were not significantly improved from the pilot test. Instead, ground truth transcriptions were manually created from the audio files to use for testing the generation pipeline. For SOAP note generation, the addition of discipline-specific context tailored to appointment type was tested. This context was curated in collaboration with SLPs from KidsAbility to include SOAP templates, definitions of key concepts, and information about speech data. A Llama 3.3 70B model was used for SOAP note generation with ground truth transcriptions and SLP specific RAG-adjacent information as context. The input context was optimized over several iterations based on clinicians’ evaluations of generated SOAP note quality. KidsAbility’s SLPs had flagged sessions targeting speech practice as having particular difficulties with AutoScribe. The model seemed unable to make inferences about the child’s speech quality from the transcript alone. Methods of quantitatively assessing speech based on session audio were explored as ways to provide additional context on speech quality to the SOAP generation model. A sample appointment was selected for testing, and child speech samples of the targeted sound were sliced from the audio and assigned quality categories. These samples were then compared against correct productions using the cosine distance between their mel-spectrograms. The samples were also passed through a phoneme-based ASR model to get the layer activations. The cosine distances and layer outputs were then tested as predictive measures of articulation accuracy, with layer outputs yielding the best results. The resulting speech accuracy scores were then passed into the generation model as additional context, with the output containing correct statements about the nature of the child’s articulations. Though clinicians’ availability limited the extensiveness of generated SOAP note evaluations, the SOAP notes generated with SLP-specific context showed improvement compared to the basic model generation. The model also tended to repeat information from previous SOAP notes if examples were provided. It was found that quantitative speech analysis does seem possible using phoneme model layer activations and cosine distances between the mel-spectrograms of correct articulations. Based on these findings, further optimizations to the generation pipeline and work on making effective AI tools for KidsAbility’s EY SLPs will continue.