Slide Slam R8
Automated analysis of letter fluency data
Sunghye Cho1, Naomi Nevler2, Natalia Parjane2, Christopher Cieri1, Mark Liberman1, Murray Grossman2, Katheryn Cousins2; 1Linguistic Data Consortium, University of Pennsylvania, 2Penn Frontotemporal Degeneration Center, University of Pennsylvania
Introduction: The letter-guided fluency task is a measure of an individual’s executive function and working memory. In clinical settings, the fluency score has been shown to be sensitive to neurodegenerative disease, psychosis, and other neurological conditions. However, a comprehensive analysis of letter fluency beyond the total score is still lacking; previous methods cannot be easily applied on a large scale due to reliance on manual assessments that are time intensive and require some level of expertise. To address this issue, we developed a novel automated method that is quantifiable and reproducible. We investigated how lexical and phonetic characteristics of words produced during the F-letter fluency task were related to the overall performance, inter-word response time (RT), and task duration, using this method. Methods: We recorded and transcribed 30-second digitized audio samples of F-letter-guided fluency tasks, produced by 76 young healthy participants (mean=20.1±1 years; 35 females). Using the transcripts, our automated algorithm counted the total number of correct “F” words produced and rated individual words for concreteness, ambiguity, frequency, familiarity, age of acquisition (AoA), and word length using published norms. The mean and standard deviation for lexical variables was calculated for each individual. A forced-aligner automatically aligned each transcript with the corresponding audio recording, and we measured word start time, word duration, and inter-word RT. Finally, we calculated articulation rate (syllable count per second), phonetic distance between two consecutive F-letter words (cumulative distance of the first 13 mel-frequency cepstral coefficients between the two words), and semantic distance (Euclidean distance between the vector representations of two consecutive F-letter words). Results: Total F-letter score significantly correlated with higher mean AoA (rho=0.38, p<0.001) and articulation rate (rho=0.24, p=0.034), and with lower mean word frequency (rho=-0.33, p=0.003), familiarity (rho=-0.24, p=0.035), word duration (rho=-0.26, p=0.023), and phonetic similarity (rho=-0.25, p=0.033). Total score was also positively correlated with an individual’s standard deviation of AoA (rho=0.37, p<0.001), familiarity (rho=0.31, p=0.007), and phonetic similarity (rho=0.33, p=0.004). Thus better performance was associated with faster speaking rate, the production of less frequent, less familiar, higher AoA, and more phonetically similar words, and greater variance in AoA, familiarity and phonetic similarity of the words produced. Inter-word RT was negatively correlated with frequency (p=0.002) and ambiguity (p=0.006) of F-letter words, and was positively correlated with AoA (p=0.002), number of phonemes (p=0.006), phonetic distance (p<0.001) and semantic distance (p<0.001). Lastly over the course of the task, the frequency (p<0.001), ambiguity (p=0.003), and semantic distance between words (p=0.031) significantly decreased over time, whereas AoA (p<0.001) and the number of phonemes per word (p=0.045) increased. Conclusion: This study shows that the strategy that participants with high F-letter scores employ involves words’ lexical and acoustic characteristics. This study also demonstrates the successful implementation of our automated language processing pipelines in a standardized neuropsychological task. This novel approach captures subtle and rich language characteristics during test performance that enhance informativeness. This work will serve as the reference for letter-guided category fluency production similarly acquired in neurodegenerative patients.