We've all experienced the frustration of having a computer speech system fail gloriously to understand what we're tyring to tell it. It just so happens that when I use Siri, shy seemingly gives up as soon as it's confused about what I'm saying, and then converts all that it did "hear" – in other words, actively record and process – into the most probable transcription. This transcription, without the context of the rest of my utterance, is almost always and entirely wrong. But it seems that this error arises also in some sense from a "mishearing" and thus a mis-transcribing of what was said. Although this is most probably not how Siri truly functions, we can imagine, as she types on on the screen what she hears us say, that she is a mapping from sound to writing.
In the Kenstowicz reading, we are told of the distinction between allophones and phonemes. This distinction explains, at least in part, the reason for there being such a breadth of ways any given "sound" or letter in English (or potentially any language) is actually pronounced. The logical unit we learn, the phoneme, is not always manifested as the same allophone. While Kenstowicz discusses rules that govern these allophone-phoneme transformations, we can see that this distinction provides a challenge for any system that attempts to convert sound to writing of the same language.
While not referenced directly by Gussenhoven (except in multiple redirections of the reader to the table on page eighteen), the IPA – International Phonetic Alphabet – is, supposedly, a system that allows us to represent sounds created by human speech directly as writing. In essence, there should be a unique mapping from a sound to a sequence of symbols representing it. (This is also hinted at in Kenstoqicz, who mentions the choice of phonemes a language makes from the posited Universal Grammar, which would then serve as a set of sounds.) If the IPA can truly function is the way, abstracting a program such as Siri into a mapping from sound to IPA, and then from IPA to possible language-specific interpretations would potentially function very well. Going from IPA to possible "real writing" in a language, while a hard an interesting problem, does not serve any immediate challenge to this idea; the problem with this approach lies in the assumption of the existence of a way to go from sound to IPA (or really any symbolic representation). No human "speaks" IPA in a native sense, and thus even IPA has the potential to not be unambiguous, with people having different interpretations of symbols, even if the symbols are not supposed to allow for them. This question shows only whether IPA is the correct system to target the transcription of sound.
The real question is does such a system exist. If it does, maybe a perfect Siri is not that far off.
In the Kenstowicz reading, we are told of the distinction between allophones and phonemes. This distinction explains, at least in part, the reason for there being such a breadth of ways any given "sound" or letter in English (or potentially any language) is actually pronounced. The logical unit we learn, the phoneme, is not always manifested as the same allophone. While Kenstowicz discusses rules that govern these allophone-phoneme transformations, we can see that this distinction provides a challenge for any system that attempts to convert sound to writing of the same language.
While not referenced directly by Gussenhoven (except in multiple redirections of the reader to the table on page eighteen), the IPA – International Phonetic Alphabet – is, supposedly, a system that allows us to represent sounds created by human speech directly as writing. In essence, there should be a unique mapping from a sound to a sequence of symbols representing it. (This is also hinted at in Kenstoqicz, who mentions the choice of phonemes a language makes from the posited Universal Grammar, which would then serve as a set of sounds.) If the IPA can truly function is the way, abstracting a program such as Siri into a mapping from sound to IPA, and then from IPA to possible language-specific interpretations would potentially function very well. Going from IPA to possible "real writing" in a language, while a hard an interesting problem, does not serve any immediate challenge to this idea; the problem with this approach lies in the assumption of the existence of a way to go from sound to IPA (or really any symbolic representation). No human "speaks" IPA in a native sense, and thus even IPA has the potential to not be unambiguous, with people having different interpretations of symbols, even if the symbols are not supposed to allow for them. This question shows only whether IPA is the correct system to target the transcription of sound.
The real question is does such a system exist. If it does, maybe a perfect Siri is not that far off.
Interesting idea, about using the IPA to inform Siri. However, I think the largest challenge is going from sounds to actual words that we use every day. An article I came across from 2011 -- https://www.cnet.com/how-to/how-to-improve-siri-by-using-phonetic-names/ -- talks about how you can improve your Siri experience by spelling names phonetically. So it seems Siri's ability to get phonetic things are up to snuff. The tough part is getting from phonetics to real language.
ReplyDeleteI think your idea about a perfect Siri is absolutely intriguing! Before Thursday's class, I had never actually anticipated the extent to which Siri has trouble understanding/ mapping our words and following instructions. The idea of mapping an exact IPA is very is interesting, and you might find it cool to know that Alexa (Amazon's equivalent of Siri) does a much better job of understanding and interpreting speech, so we are actually making progress towards a "perfect Siri"!
ReplyDeleteI like your creative idea of mapping sounds with IPA instead of words directly. However, encoding voice into IPA will lose a lot of data in the process like subtle differences between tones or durations of vowels that are too small for IPA to record. If the person has a peculiar accent, the system might transcribe the voice into the wrong IPA symbol, and the computer that changes the IPA into human language would have to do more guesswork and might generate errors.
ReplyDelete