# Re: Thomas' Fuzziness and Probability

From: S. F. Thomas (sfrthomas@yahoo.com)
Date: Mon Aug 13 2001 - 13:02:18 MET DST

• Next message: Stephan Lehmke: "Re: Thomas' Fuzziness and Probability"

hrubin@odds.stat.purdue.edu (Herman Rubin) wrote in message news:<9l4kv0\$2evu@odds.stat.purdue.edu>...
> In article <66b61316.0108091708.7d6b9958@posting.google.com>,
> S. F. Thomas <sfrthomas@yahoo.com> wrote:
> >robert@localhost.localdomain (Robert Dodier) wrote in message
> >news:<9kt895\$rs\$1@localhost.localdomain>...
> >> In the interest of brevity, I've indulged in wanton snippage,
> >> but I hope what's left yields something comprehensible.
>
> >> S. F. Thomas <sfrthomas@yahoo.com> wrote:
>
> >> > Robert Dodier wrote:
>
> ..............
>
> >Goodness, no. What I do argue however is that the semantics of
> >likelihood do not just fall neatly out from the semantics of
> >probability. Probability provides some of the underpinning, but not
> >all. Otherwise Fisher would not have been led up a blind alley by
> >asserting that the "likelihood of a or b is like the income of Peter
> >or Paul, we don't know what it is until we know which is meant."
>
> I am by no means convinced that Fisher understood this, but
> I can see no way that the likelihood of "a or b" makes any
> sense at all.

That has precisely been the problem for all the generations of
statisticians since Fisher. I presume you to refer to the original
probability model from which likelihood derives, f(x;w) where x ranges
over sample space, and w ranges over parameter space and f is the
density function for the random variable in question. For any point
hypothesis w=a, f is clearly defined. But for a composite hypothesis
{a,b}, it is not clear how f is defined. Therefore -- and this was the
precise thrust of Fisher's metaphor -- we don't know what the
likelihood of "a OR b" is until, like the income of Peter or Paul, we
know which is meant. I presume that it is thinking along these or
similar lines that leads you to say that the likelihood of "a or b"
makes no sense at all. Or to say that the likelihood or "a OR b" is
the likelihood corresponding to the stronger element. Which is what
leads to a maximum rule for likelihood disjunction. Or, one uses the
probability metaphor as in the Bayesian set-up, rescales the
likelihood function to sum to unity, whereupon the likelihood of "a OR
b" becomes the sum of the two (rescaled) likelihoods, with appropriate
modification if the likelihood is construed as density function and
the integral calculus is applied. Like it or not, that is essentially
what Bayes does, although the story and the argumentation to get there
are very different, requiring ritualistic obeisance to priors of one
form or another, in particular "uninformative" if need be. Be all that
as it may, if you have an inferential method that purports to give a
direct characterization of uncertainty in model parameters, then you
are perforce computing likelihoods of sets or of composite hypotheses,
ie. you have a method for computing something like L(a OR b).

> This
> >leads to a likelihood calculus in which set evaluation is of the form
>
> > L( {a,b} ) = L(a OR b) = Max( L(a), L(b) )
>
> Are you taking a view of a linear truth value system?

Maybe... I don't know what you mean by "linear truth value system".

>
> AFAIK, this was first proposed by Lukasiewicz, and does
> not work at all well.
>
> Likelihood is NOT probability, and "a OR b" does not
> mean anything from the standpoint of likelihood.

But see above.

>
> >which rather quickly proves to be inadequate. Had it not been
> >inadequate, I don't think classical statistics would have gone to all
> >the trouble it has to develop indirect methods of describing the
> >uncertainty in model parameters consequent upon sampling. Nor would
> >there have been a neo-Bayesian revival intended to supplant the
> >classicists precisely by offering a method of *direct*
> >characterization. Indeed, Bayes offers a likelihood calculus in which
>
> > L(a OR b) ~ (L(a) + L(b))
>
> Bayes never offered anything about a likelihood calculus.

Nor did Savage, de Finetti and the others. My point was different. It
is that that, *in effect*, is what Bayesian inference is doing. The
whole song and dance about the prior just confuses this core issue, to
which it is easy to return simply by imagining a completely
"uninformative" prior (if such a thing is not a contradiction in
probabilistic terms), and seeing the posterior for what it then is,
ie. the likelihood function appropriately rescaled, and now
interpreted as probability or probability density.

>
> To Bayes, Fisher, Neyman, Laplace, Gauss, Kolmogorov, and
> others, one can take the or of statements or the union of
> events, but this is for probability. Likelihood is not
> probability, although it is an equivalence class of formal
> entities derived from probability.

I am most certainly under no confusion on that score.

> >where ~ is to indicate that some normalization, appropriate to the
> >construction of likelihood as a metaphorical (belief) probability, is
> >necessary. It is only with the fuzzy set theory that semantics
> >suggests itself
>
> > L(a OR b) = L(a explains the data OR b explains the data)
>
> >where "explains the data" is a fuzzy predicate no different in
> >principle from "is tall", and subject to calibration in conceptually
> >the same way. This leads, albeit with some reworking of the Zadehian
> >fuzzy set theory along the way, to
>
> "Explains the data" is philosophical gobbledygook. Assuming
> that we can assume that we have a binomial model, and we get
> a positive number of successes and failures, ALL binomial
> distributions with 0 < p < 1 "explain" the data; there is
> a positive probability that the data could have come from
> such a model.

But some explain the data better than others. As Fisher said, the
likelihood function supplies a "natural order of preference for the
possibilities under consideration". It is exactly analogous to the
notion of semantic likelihood (or membership function) for a term such
as "tall" providing a natural order of preference for what a competent
speaker of the language *could* mean when she uses the term tall to
characterize one's height. Therefore, analogously to some speaker
(witness) saying "the unknown attacker is tall", the result of
sampling from a probability distribution is the implicit assertion of
"the data" to the effect "the unknown probability model parameter is
[an explanation of the observed sample]", and the membership function
of the term in brackets may be identified with the (absolute)
likelihood function generated by the data under the model. Call that
philosophical gobbledygook if you like. All philosophical abstraction
is in the end metaphor. Some such abstractions never make it down to
ground I quite agree. But that is not the case here. What I propose is
quite computable. And the essential insight seems to me to be quite
plain, though I would readily admit that the semantics are unfamiliar.

>
> > L(a OR b) = L(a) + L(b) - L(a)*L(b)
>
> >where indeed the laws of probability are invoked, and at that in a
> >very simple way, but it is the fuzzy set semantics, and the device of
> >the calibrational proposition, that provides the essential frame that
> >Fisher overlooked.
>
> The likelihood function can be multiplied by any constant,
> and often is; L and c*L are the "same" likelihood function
> for any statistical purpose.

Not if you are using the product-sum rule of disjunction. For that
purpose, one must distinguish the absolute likelihood function from
the *relative* likelihood, which I quite agree is unique only up to
similarity transformations, and to which you allude. THus for example,
if you are computing a marginal likelihood function, you would work
with the absolute likelihoods to accomplish the marginalization, and
only then may you rescale. In the theory I am concerned to develop I
in fact use the term membership or characteristic function for the
absolute likelihood, since I am essentially drawing on the insights
and semantics of the fuzzy set theory (reworked to admit the notion of
calibrational proposition with which this thread was begun) and the
term possibility distribution for the relative likelihood.

Why should anything be
> independent, even if it can be considered probabilities?
>
> .................

I am not sure I get your point here in this context. But if it is what
I think it is, then the reworked fuzzy set theory continues to have
the min-max connectives in certain circumstances, in particular when
there are constraints of strong positive semantic consistency linking
the respective affirmation probabilities ... in such cases there is
clearly no independence. Likewise, where there is strong negative
semantic consistency (for example those affirming an exemplar to be
tall tending systematically to disaffirm him to be short), the
appropriate rules for the conjunction and disjunction connectives are
the bounded-sum (Lukasiewicz) rules. It is only when semantic
independence may be assumed that the product and product-sum rules are
appropriate. That would appear to be the appropriate assumption in the
case of statistical inference involving in a sense the interpretation
of what "data" say.

Regards,
S. F. Thomas

############################################################################
This message was posted through the fuzzy mailing list.
(1) To subscribe to this mailing list, send a message body of
"SUB FUZZY-MAIL myFirstName mySurname" to listproc@dbai.tuwien.ac.at
(2) To unsubscribe from this mailing list, send a message body of
"UNSUB FUZZY-MAIL" or "UNSUB FUZZY-MAIL yoursubscription@email.address.com"
to listproc@dbai.tuwien.ac.at
(3) To reach the human who maintains the list, send mail to
fuzzy-owner@dbai.tuwien.ac.at
(4) WWW access and other information on Fuzzy Sets and Logic see
http://www.dbai.tuwien.ac.at/ftp/mlowner/fuzzy-mail.info
(5) WWW archive: http://www.dbai.tuwien.ac.at/marchives/fuzzy-mail/index.html

This archive was generated by hypermail 2b30 : Mon Aug 13 2001 - 13:21:22 MET DST