Speechy Research Devlog: Some New Tools & New Discoveries
Hey everyone, so it is about 8:30pm and I am sure that by the time I write this it will be nearly 9 but I wanted to update everyone who is following my Speechy research on here. I programmed 2 new programs today, a Prosodic Pitch Analyzer (PPA), and an RMS Energy Analyzer using my handy-dandy new favorite library librosa.
Prosodic Pitch Analyzer
The PPA calculates the fundamental frequency (F0) or pitch of an audio signal and visualizes it using a line plot. This is a useful tool for analyzing prosodic features of speech such as intonation, stress, and emphasis.
The code takes an audio file as input, processes it using the librosa library to extract the fundamental frequency / pitch, and then plots the pitch contour using matplotlib.
The output plot shows the pitch contour of the audio signal over time, with changes in pitch represented by changes in the vertical position of the line. The plot can be used to identify patterns in the pitch contour, such as rising or falling intonation, and to compare the pitch contour of different audio signals. The prosodic pitch analyzer can be used to detect changes in pitch, which can be indicative of a neurological speech disorder. For example, a person with ataxic dysarthria, which is caused by damage to the cerebellum, may have difficulty controlling the pitch and loudness of their voice, resulting in variations in pitch that are not typical of normal speech. By analyzing changes in pitch using a tool like the prosodic pitch analyzer, it is possible to identify patterns that are indicative of certain neurological disorders. This information can be used by clinicians to diagnose and treat speech disorders, and to monitor progress in speech therapy.
RMS Energy Analyzer
The program that calculates the energy of a person's speech processes an audio file and calculates the energy of the signal at each time frame. This can be useful for analyzing changes in a person's speech over time, as well as for detecting changes in the intensity or loudness of the speech.
The program uses the librosa library to load and process the audio file, and then calculates the energy of each frame using the root-mean-square (RMS) energy of the signal. The energy values are then plotted over time using the matplotlib library, allowing you to visualize changes in the energy of the speech.
By analyzing changes in energy over time, you can gain insight into how the speech patterns of people with these disorders may differ from those without.
Analysis with PPA
The research that I've been focused on today primarily looked at the speech recording of myself, the mid-stage HD patient with chorea, the late-stage HD patient (EOL), and a young girl with aphasia.
The patient with aphasia had slurred speech and varied rising and falling much like an AD patient. Earlier I saw her ROS and was surprised at the differences between my rate of speech and hers (aphasia v AD)
My rate of speech
The girl with aphasia's rate of speech
So I decided to compare our speech pitches as well and this is what ours looked like side-by-side.
Hers is on the left, mine on the right.
Her pitch seemed to start off higher (unstable though) like mine, but mine fell during my recording and wobbled for a while. She had some drastic pitch differences but mine had around 16 peaks, where hers had around 18-19. Her latter peaks weren't as high frequency as mine, as my frequency peaks ended up mostly very high in the 1600hz or around 1000hz. There is quite a bit of instability in both our pitches though.
Her energy levels in the 15 seconds of speech started off at high-mid energy, then dropped around 1 second in until almost 3 seconds, shot back up and varied in high, high-mid energy, then had several "dips, and higher moments of energy. At the end around 13 seconds she got a huge boost of "gusto" (well.. energy). She had around 7 breaths (noted by the dips / flatlines)
This was mine. It seems like as the 15 seconds went on I started to run out of steam. I wasn't able to keep my energy higher. Mine had around 11 breaths so I was running out of breath eg having a breathier voice more than she was.
Research Conclusion for Today
Although we have quite a bit in common with our speech energy and pitches, our rate of speaking isn't. She used more syllables at a constant rate which made it pretty obvious she had a lot of slurring / overshooting, mine was a lot less syllables and rate of speech was quite slow and varied more than hers. This illustrates my cognitive difficulties and use of placeholder words along with slight slurring.
As far as pitch, seems that we had similar issues with pitch throughout the 15 second clips, mine spiked in the latter when I was getting "wore out" and hers spiked earlier when she had more energy.
Our energy levels differ because although she had moments of energy, I tuckered out pretty quickly.
I hope this helps shed some insight into both aphasia patients and ataxic dysarthria / HD patients speech / some cognitive differences.
Will update again tomorrow when I am done with another day of programming and research!













