On this day… 7th of July
seen from United States
seen from United Kingdom
seen from United States
seen from France
seen from China

seen from Maldives
seen from United States

seen from United States
seen from United States
seen from United States
seen from Italy

seen from Maldives
seen from Taiwan

seen from Germany
seen from Türkiye
seen from United States
seen from United States
seen from Malaysia

seen from India

seen from United States
On this day… 7th of July
freedom island
Day 5 - Segmentation
So, today I didn’t do anything. Or rather, I did one single thing, ALL DAY. Deepam and I had to segment a whack ton of data from the Mostec dialect nasal harmony project that ran last year, because of all the recordings, the previous students had segmented only one repetition. This is out of the 2 or 3 or sometimes even 4 repetitions. So we got to go back and segment all the other ones! Today I segmented a grand total of 340 words. WOW!
I mentioned segmentation in my post yesterday, but in a little more detail:
This process is taking a spectrogram, the gray lines in the picture below, and looking closely at it to see exactly what word was pronounced. Every consonant and vowel has a different pattern of lines, and going by this and the word we know the participant was supposed to be saying, we can “cut up” the spectrogram into pieces and label what segment each piece was. In addition, the two waveforms in the recording tell us if the sound is nasal or not. The upper waveform is nasal, and the lower channel is oral. All recordings are oral, but some are more nasal than others. This is the basis of the project - seeing which words are becoming nasal.
An example of the un-annotated recording:
and then the annotated version after segmenting and marking which parts are nasal:
(this word is pronounced tasheema)
Because this is data from the previous year, we know what segment the participant said. In addition, we can play back the recording as many times as we want to ensure that the segmentation is accurate! You can zoom in and playback millisecond-sized bits of the initial recording.
Some sounds are easier to segment than others. For example, vowels tend to have VERY similar looking spectrogram patterns, so it’s most useful to listen to the recording if you don’t already know what vowel it is. Also, things like /w/ or /j/ (pronounced like a y, such as “ya”) are really similar to vowels, and /l/ and /r/ a little less so but still really close. Sounds like /t d k/ are easy to spot, and /z s f/ etc are easy as well. Slovenian has a consonant that’d midway between a /w/ and a /v/ that is tricky as well. Here’s a picture of segmenting a word with very similar looking sounds, including the w-v sound:
(so this word is pronounced something between zaveeya and zaweeya)
But after doing a bunch of them they’re not that hard anymore! It required a lot of listening to the recording at first but I’m super fast now. All in all it was a super productive day!! I was sitting at the table for the entire day doing this and only this, minus a quick walk to the bakery with Wenxuan, but it was almost entirely segmenting! I really didn’t mind it that much, it’s pretty mindless work that you really get into the rhythm of! I felt super productive to get all my tasks done by the end of the day!
Tomorrow Deepam and I are going to Mostec - Finally we will gather some data!! We have 4 participants tomorrow, which is really a lot. I`m super excited to jump right into it though! It will be tiring but totally worth it. We did a test run with the nasalance mask and recording software and it went well! I have high hopes that tomorrow will go well too.
I think my post tomorrow will be a more exciting post than this!! I cant wait!
And… to be honest with ya, I’d rather spend the afternoon spoiling my husband.
07-Jul-2017
BJ Bae (1) (2) 20170707 Yangsan Healing Festival
별것 MY PARADISE 언제나맑음 20170707 Yangsan Healing Festival