Re: Thomas' Fuzziness and Probability

From: S. F. Thomas (
Date: Mon Aug 13 2001 - 13:02:18 MET DST

  • Next message: Stephan Lehmke: "Re: Thomas' Fuzziness and Probability" (Herman Rubin) wrote in message news:<9l4kv0$>...
    > In article <>,
    > S. F. Thomas <> wrote:
    > >robert@localhost.localdomain (Robert Dodier) wrote in message
    > >news:<9kt895$rs$1@localhost.localdomain>...
    > >> In the interest of brevity, I've indulged in wanton snippage,
    > >> but I hope what's left yields something comprehensible.
    > >> S. F. Thomas <> wrote:
    > >> > Robert Dodier wrote:
    > ..............
    > >Goodness, no. What I do argue however is that the semantics of
    > >likelihood do not just fall neatly out from the semantics of
    > >probability. Probability provides some of the underpinning, but not
    > >all. Otherwise Fisher would not have been led up a blind alley by
    > >asserting that the "likelihood of a or b is like the income of Peter
    > >or Paul, we don't know what it is until we know which is meant."
    > I am by no means convinced that Fisher understood this, but
    > I can see no way that the likelihood of "a or b" makes any
    > sense at all.

    That has precisely been the problem for all the generations of
    statisticians since Fisher. I presume you to refer to the original
    probability model from which likelihood derives, f(x;w) where x ranges
    over sample space, and w ranges over parameter space and f is the
    density function for the random variable in question. For any point
    hypothesis w=a, f is clearly defined. But for a composite hypothesis
    {a,b}, it is not clear how f is defined. Therefore -- and this was the
    precise thrust of Fisher's metaphor -- we don't know what the
    likelihood of "a OR b" is until, like the income of Peter or Paul, we
    know which is meant. I presume that it is thinking along these or
    similar lines that leads you to say that the likelihood of "a or b"
    makes no sense at all. Or to say that the likelihood or "a OR b" is
    the likelihood corresponding to the stronger element. Which is what
    leads to a maximum rule for likelihood disjunction. Or, one uses the
    probability metaphor as in the Bayesian set-up, rescales the
    likelihood function to sum to unity, whereupon the likelihood of "a OR
    b" becomes the sum of the two (rescaled) likelihoods, with appropriate
    modification if the likelihood is construed as density function and
    the integral calculus is applied. Like it or not, that is essentially
    what Bayes does, although the story and the argumentation to get there
    are very different, requiring ritualistic obeisance to priors of one
    form or another, in particular "uninformative" if need be. Be all that
    as it may, if you have an inferential method that purports to give a
    direct characterization of uncertainty in model parameters, then you
    are perforce computing likelihoods of sets or of composite hypotheses,
    ie. you have a method for computing something like L(a OR b).

    > This
    > >leads to a likelihood calculus in which set evaluation is of the form
    > > L( {a,b} ) = L(a OR b) = Max( L(a), L(b) )
    > Are you taking a view of a linear truth value system?

    Maybe... I don't know what you mean by "linear truth value system".

    > AFAIK, this was first proposed by Lukasiewicz, and does
    > not work at all well.
    > Likelihood is NOT probability, and "a OR b" does not
    > mean anything from the standpoint of likelihood.

    But see above.

    > >which rather quickly proves to be inadequate. Had it not been
    > >inadequate, I don't think classical statistics would have gone to all
    > >the trouble it has to develop indirect methods of describing the
    > >uncertainty in model parameters consequent upon sampling. Nor would
    > >there have been a neo-Bayesian revival intended to supplant the
    > >classicists precisely by offering a method of *direct*
    > >characterization. Indeed, Bayes offers a likelihood calculus in which
    > > L(a OR b) ~ (L(a) + L(b))
    > Bayes never offered anything about a likelihood calculus.

    Nor did Savage, de Finetti and the others. My point was different. It
    is that that, *in effect*, is what Bayesian inference is doing. The
    whole song and dance about the prior just confuses this core issue, to
    which it is easy to return simply by imagining a completely
    "uninformative" prior (if such a thing is not a contradiction in
    probabilistic terms), and seeing the posterior for what it then is,
    ie. the likelihood function appropriately rescaled, and now
    interpreted as probability or probability density.

    > To Bayes, Fisher, Neyman, Laplace, Gauss, Kolmogorov, and
    > others, one can take the or of statements or the union of
    > events, but this is for probability. Likelihood is not
    > probability, although it is an equivalence class of formal
    > entities derived from probability.

    I am most certainly under no confusion on that score.
    > >where ~ is to indicate that some normalization, appropriate to the
    > >construction of likelihood as a metaphorical (belief) probability, is
    > >necessary. It is only with the fuzzy set theory that semantics
    > >suggests itself
    > > L(a OR b) = L(a explains the data OR b explains the data)
    > >where "explains the data" is a fuzzy predicate no different in
    > >principle from "is tall", and subject to calibration in conceptually
    > >the same way. This leads, albeit with some reworking of the Zadehian
    > >fuzzy set theory along the way, to
    > "Explains the data" is philosophical gobbledygook. Assuming
    > that we can assume that we have a binomial model, and we get
    > a positive number of successes and failures, ALL binomial
    > distributions with 0 < p < 1 "explain" the data; there is
    > a positive probability that the data could have come from
    > such a model.

    But some explain the data better than others. As Fisher said, the
    likelihood function supplies a "natural order of preference for the
    possibilities under consideration". It is exactly analogous to the
    notion of semantic likelihood (or membership function) for a term such
    as "tall" providing a natural order of preference for what a competent
    speaker of the language *could* mean when she uses the term tall to
    characterize one's height. Therefore, analogously to some speaker
    (witness) saying "the unknown attacker is tall", the result of
    sampling from a probability distribution is the implicit assertion of
    "the data" to the effect "the unknown probability model parameter is
    [an explanation of the observed sample]", and the membership function
    of the term in brackets may be identified with the (absolute)
    likelihood function generated by the data under the model. Call that
    philosophical gobbledygook if you like. All philosophical abstraction
    is in the end metaphor. Some such abstractions never make it down to
    ground I quite agree. But that is not the case here. What I propose is
    quite computable. And the essential insight seems to me to be quite
    plain, though I would readily admit that the semantics are unfamiliar.
    > > L(a OR b) = L(a) + L(b) - L(a)*L(b)
    > >where indeed the laws of probability are invoked, and at that in a
    > >very simple way, but it is the fuzzy set semantics, and the device of
    > >the calibrational proposition, that provides the essential frame that
    > >Fisher overlooked.
    > The likelihood function can be multiplied by any constant,
    > and often is; L and c*L are the "same" likelihood function
    > for any statistical purpose.

    Not if you are using the product-sum rule of disjunction. For that
    purpose, one must distinguish the absolute likelihood function from
    the *relative* likelihood, which I quite agree is unique only up to
    similarity transformations, and to which you allude. THus for example,
    if you are computing a marginal likelihood function, you would work
    with the absolute likelihoods to accomplish the marginalization, and
    only then may you rescale. In the theory I am concerned to develop I
    in fact use the term membership or characteristic function for the
    absolute likelihood, since I am essentially drawing on the insights
    and semantics of the fuzzy set theory (reworked to admit the notion of
    calibrational proposition with which this thread was begun) and the
    term possibility distribution for the relative likelihood.

    Why should anything be
    > independent, even if it can be considered probabilities?
    > .................

    I am not sure I get your point here in this context. But if it is what
    I think it is, then the reworked fuzzy set theory continues to have
    the min-max connectives in certain circumstances, in particular when
    there are constraints of strong positive semantic consistency linking
    the respective affirmation probabilities ... in such cases there is
    clearly no independence. Likewise, where there is strong negative
    semantic consistency (for example those affirming an exemplar to be
    tall tending systematically to disaffirm him to be short), the
    appropriate rules for the conjunction and disjunction connectives are
    the bounded-sum (Lukasiewicz) rules. It is only when semantic
    independence may be assumed that the product and product-sum rules are
    appropriate. That would appear to be the appropriate assumption in the
    case of statistical inference involving in a sense the interpretation
    of what "data" say.

    S. F. Thomas

    This message was posted through the fuzzy mailing list.
    (1) To subscribe to this mailing list, send a message body of
    "SUB FUZZY-MAIL myFirstName mySurname" to
    (2) To unsubscribe from this mailing list, send a message body of
    (3) To reach the human who maintains the list, send mail to
    (4) WWW access and other information on Fuzzy Sets and Logic see
    (5) WWW archive:

    This archive was generated by hypermail 2b30 : Mon Aug 13 2001 - 13:21:22 MET DST