>...[snip]...
>Finally, if fuzzy is substantively different from smoothing,
>could you give an example of an actual construct -- a computer
>program preferably -- which uses fuzzy to some purpose other
>than smoothing? I have made this request repeatedly, with no
>takers. I am genuinely interested, and would be pleased with
>a concrete example.
OK, will do. It's not a computer program, but it _is_ a concrete example
from my own research (real data, honest!:-) that involves both fuzzy sets
and statistics in complementary roles, and no smoothing. It also
(indirectly) sheds some light on the issues of fuzzy vs prob and the
'objective' status of fuzzy membership functions.
In a study conducted on college students' job interests, ratings were
collected on the degree to which each respondent would _seek_ a job with a
particular characteristic and also on the extent to which they would wish
to _avoid_ a job with that same characteristic. Both ratings were on
subjective 7-point scales. One hypothesis was that for each characteristic,
a respondent would not seek that characteristic to any greater degree than
they would avoid it (i.e., respondents do not strongly seek and strongly
avoid the same characteristic). They might, of course, neither seek it nor
avoid it strongly.
A reasonable translation of that hypothesis is that the set of those who
seek a characteristic is a subset of those who do not avoid it; and
likewise the set of those who avoid it is a subset of those who do not seek
it. So this is a 'set-inclusion hypothesis'. If we had just asked students
for yes-no answers to the 'seek' and 'avoid' questions instead of 7-point
scales, then we would be able to test this hypothesis very easily using
standard methods for analyzing categorical data. We could test, for
instance, whether the odds of belonging to the subset is the same as the
odds of belonging to its intersection with the set including it.
In this case, we have two ordinal scales that we can treat as fuzzy sets
with discrete membership levels. If our hypothesis is true, then the odds
of belonging to the kth level-set of a subset should be the same as the
odds of belonging to the kth level-set of its intersection with the set
that includes it. If it weren't for fuzzy sets and level-set concepts, we
would be in the scandalous position of not being able to test an hypothesis
merely because we have more sensitive measurement than 'yes-no'!
The first table below shows a cross-tabulation of the ratings for
seeking and avoiding "Realistic" jobs. The table after that shows the same
data arranged in cumulative frequencies, accumulating from the highest
membership level of "Seeking" leftwards and from the lowest level of
"Avoiding" upwards. The diagonal cumulant is the cumulative frequencies of
the intersection of "Seek" and (Not)-"Avoid" for each level-set. I will
test a model that specifies that the odds-ratios for the diagonal and
column cumulants in the second table are constant across all membership
levels and close to 1.
Table of frequencies:
S e e k i n g
0/6 1/6 2/6 3/6 4/6 5/6 6/6 totals
-------+-----------------------------------+-----
6/6 | 8 3 2 1 1 1 0 | 16
5/6 | 8 4 3 2 0 0 0 | 17
A 4/6 | 17 11 7 10 1 0 0 | 46
v 3/6 | 13 11 22 30 6 0 1 | 83
o 2/6 | 5 7 12 23 3 2 0 | 52
i 1/6 | 1 5 10 25 19 9 1 | 70
d 0/6 | 3 3 2 16 13 13 26 | 76
-------|-----------------------------------|-----
totals | 55 44 58 107 43 25 28 |360
Table of cumulative frequencies:
S e e k i n g
0/6 1/6 2/6 3/6 4/6 5/6 6/6 totals
-------+-----------------------------------+-----
6/6 | _______________________________|360
5/6 | | 297 __________________________|344
A 4/6 | | | 251 _____________________|327
v 3/6 | | | | 187 _______________|281
o 2/6 | | | | | 86 __________|198
i 1/6 | | | | | | 49 _____|146
d 0/6 | | | | | | | 26 | 76
-------|---|----|----|-----|----|----|-----|-----
totals |360 305 261 203 96 53 28 |
If we denote the cumulative logit of the diagonal cell(k,k) by Lkk and
the corresponding kth-row marginal cumulative logit by Lk+, our model is
Lkk = Lk+ + b,
where b is to be estimated. This model simply says the ratio of the odds of
being in the kth level-set of the intersection to the odds of being in the
kth level-set for "Seek" is exp(b) regardless of membership level: hence, a
'constant odds-ratio' model. Since the diagonal cumulants partition the
whole table, we may take the differences between adjacent ones to obtain
expected frequencies for each of the L-shaped regions in the second table,
and compare those with the observed frequencies using a Chi-square
statistic.
If we denote the cumulative logit of the diagonal cell(k,k) by Lkk and
the corresponding kth-row marginal cumulative logit by Lk+, our model is
Lkk = Lk+ + b,
where b is to be estimated. This model simply says the ratio of the odds of
being in the kth level-set of the intersection to the odds of being in the
kth level-set for "Seek" is exp(b) regardless of membership level: hence, a
'constant odds-ratio' model. Since the diagonal cumulants partition the
whole table, we may take the differences between adjacent ones to obtain
expected frequencies for each of the L-shaped regions in the second table,
and compare those with the observed frequencies using a Chi-square
statistic.
Taking differences between adjacent diagonal cumulative frequencies to
get the observed frequencies for the diagonal gives us {26, 23, 37, 101,
64, 46, 63}, ordered from the lower right to the upper left. The maximum
likelihood estimate for b = -.1325, so the corresponding odds ratio is
.8759, which is indeed fairly close to 1. The expected frequencies from
this model are {25.7, 22.7, 39.4, 102.3, 59.2, 47.2, 63.5} and comparing
observed and expected frequencies results in Chi-square = 0.691 (5 df),
indicative of a very good fit. Hence we are justified in summarizing this
table by saying that the odds of being in the set of those who seek and
don't avoid "Realistic" jobs is .8759 times the odds of being in the set of
those who seek "Realistic" jobs, regardless of membership level-- i.e.,
those who seek generally do not avoid this characteristic.
And this wasn't a 'freak'-- the same kind of model fit our five other
job characteristics data very well, and the whole shebang was replicated by
a second panel of student data.
At the risk of going on for too long, I'd like to point out two things
about this example in relation to the fuzzy vs prob discussion. First,
where probability (statistics, really) does the job here is in modeling the
responses; what fuzzy set theory (and level-sets) provide is a way of
operationalizing an hypothesis about those responses and partitioning the
table so that an appropriate statistical technique may be applied. They are
complementary roles, and both are crucial. While I certainly won't claim
that this is the only way these data could be analyzed, I will claim that
it accurately tests our hypothesis and results in an elegant
(one-parameter!) model with very good fit.
Second, note that numerical values for the membership levels are not
needed in this model. The only strong measurement property I have used here
is to equate the kth row with the kth column in terms of membership level,
in order to generate the frequencies for the "kth" level of the
intersection. I have found that it often is not necessary to specify
numerical grades of membership. What is essential, on the other hand, is to
test whatever conjoint measurement model is implied by one's hypotheses.
--- Checkpoint. How wrong am I so far?
--Mike
Dr. Michael Smithson email: Michael.Smithson@jcu.edu.au
Behavioural Sciences phone: (61-77) 814-150
James Cook University fax: (61-77) 795-435
Queensland Australia 4811
"All you need in life is ignorance and confidence, then success is sure."
--Mark Twain