Sunday, November 13, 2016

Bayesian reasoning and Scalar Implicature

This week’s series of articles on scalar implicatures in children touches upon the idea of specific and broad scalars (‘all’ describes a very specific instance, whereas ‘some’ can include all, as well as a number of other possibilities) and the use of a probabilistic model of the world in computing scalar implicatures. Coincidentally, both these ideas align strongly with the readings we’ve been doing in SymSys about concept learning and Bayesian reasoning. To that end, I will discuss how my hypotheses about how Bayesian reasoning impacted scalar implicature changed as I read different articles.

When learning the meaning of concepts, Bayesian reasoning weights simpler, more specific hypotheses over complex, more general ones. This is because Bayesian reasoning incorporates a causal model of the world which also takes into account the likelihood of a particular effect occurring given a hypothesis (P(e|h))  in question.  More general hypotheses result a larger hypothesis space, which means that p (e|h) is relatively smaller because it is evaluated in context of all these other potential occurrences which makes the overall probability P (h|e) smaller as well. While these more general hypotheses are not completely ruled out, they are less likely than the simpler ones (because the prior probability is smaller) which helps us select a simpler hypothesis over a more general one.  

This model of Bayesian reasoning lies in contrast to the traditional model of deductive reasoning that doesn’t take probability into account. The problem with the model of deductive learning is that the pool of hypotheses about what a particular concept could signify is unmanageably large. For example, when trying to learn the concept ‘dog,’ a valid hypothesis could be ‘a member of the canine species,’ but it could also be something extremely general such as ‘small thing with four limbs’ or even ‘entity that drools.’ The reasons these more general hypotheses are valid is because they are compatible with most of the things we learn about ‘dogs.’ The problem is that because we are usually given so few counterexamples of things which fit these categories but are not dogs, we don’t have the chance to whittle down these categorical boundaries until they accurately fit the definition of a dog. Is ‘dog’ an overarching term that includes cats and little brothers? To falsify this hypothesis, we would have to provide negative evidence that neither cats, nor little brothers are dogs--but then, what about cows, llamas, and little sisters? Thus, a purely deductive form of learning would require a vast amount of negative evidence to accurately narrow down the boundaries of each concept, which, when you think of the amount of things a certain thing is not, is almost infinite.

In “Accessing the Unsaid: the role of scalar alternatives in children’s pragmatic inference,” David Barner, Neon Brooks, and Alan Bale discuss a series of experiments they performed in order to determine the reason behind children’s ability to compute scalar implicatures. Scalar implicature relies upon a process of restriction according to Barner’s model: first you understand the literal meaning of the word, then you generate a list alternative scalars, then you provide your own sort of negation by eliminating different scalars to come up with a more refined interpretation of the sentence and the word. At the conclusion of their paper, Barner, Brooks, and Bale argue that children’s difficulty with accurate use of scalar alternatives is due to their difficulty in generating alternatives to use in process of elimination. When I first read this article, my first thought was that this was an instance of learning that implied the deductive reasoning model was preferable the Bayesian one. The way that Barner, Brooks, and Bale framed the process of arriving at scalar implicatures (via negation) suggested that children needed the alternative scalars set up as ‘counterexamples’ to establish the idea that ‘some’ does not mean ‘all’. If Bayesian reasoning was involved, it would probably frame ‘some’ as the more general super-category that included ‘all,’ and thus, a level of deductive instruction was necessary to draw the boundaries between ‘some’ and ‘all’ and make the category of ‘some’ more restrictive (to not include all). Perhaps the reason that children interpreted ‘some’ to mean ‘all,’ was because (with the lack of appropriate counterexamples), ‘all’ was the most restricted concept, and thus the most probable meaning the Bayesian model.

Alex Stiller, Noah Goodman, and Michael Frank’s work in “Ad-hoc scalar implicature in adults and children” also investigates the reason behind children’s difficulty with computing scalar implicatures, but lead to a very different suggestion regarding how Bayesian reasoning was involved. In a series of experiments where he contrasted linguistic scalars with real-world scalars, he found that contextual knowledge about the frequency of something occurring within the world had a significant impact upon the type of scale computed. Stiller’s experiments revised my thinking on two levels: first by demonstrating that Bayesian reasoning was the primary form of reasoning behind scalar implicatures (and not deductive), secondly by making me think a bit more deeply about what the most likely category would be. Whilst linguistically, ‘all’ would be the most restrictive category on its own (thus, have a smaller hypothesis space and a higher chance of being likely), the incorporation of real world frequencies into this impacts the probability of the ‘all’ phenomenon occurring at all. Bayes formula states that P(h|e) = (P(e|h)*P(h))/P(e), so while ‘all’ might have a smaller P(e) term (restricted hypothesis space), it’s P(h) term would also be very small (because real world frequencies it’s less likely to have ‘all’ occur), thus decreasing the ‘weight’ of the ‘all’ concept with respect to other more general scalars such as ‘some,’ and implicitly creating these distinctions I previously thought additional deductive reasoning (or scalar alternatives) were necessary to create.

1 comment:

  1. Really interesting post, Grace! I love the way in which you tie this all to Bayesian reasoning and try to find links between the two, and I guess we could also go farther and involve Perfors' ideas of multiple hypothesis spaces in this train of thought, too. I'm curious, however, about how this discussion would tie into the idea of inductive learning. Would it strengthen or weaken that theory?

    ReplyDelete