# Adding Information To Available Data (Re: fuzzy string matching)

Bruno DiStefano (bruno@ecf.toronto.edu)
Thu, 3 Jun 1999 22:00:56 +0200 (MET DST)

In article <xaZ43.353\$Pl2.14@news4.mco>, Earl Cox <ecox@metus.com> wrote:
>While Lotfi has indicated, in many of his talks, that fuzzy logic is a
>methodology for adding the calculus of imprecision to any discipline, I
>still believe that we must look at the informational gain in fuzzy systems
>rather than simply some method of generating values in the [0,1] interval.
>
I agree with you. One way in which I express this in a course
on "fuzzy logic in real-time embedded systems", that I teach
from time to time, is to say that "the fuzzification is that
phase in which we process the incoming data together
with prior heuristic knowledge to obtain richer data".
I emphasise that for each crisp value of each input
variable we get a vector of fuzzy inputs. In my view,
this is a sign of increased overall contents of
information. The prior knowledge to which I am referring
to is akin to the knowledge of the context when evaluating
ambiguous sentences (I like the example of the BAKERs in
the army). In my course I repeat the point of the
use prior heuristic knowledge when speaking of
defuzzification (going from a vector of fuzzy outputs
to a single crisp output).

All of the above makes sense when dealing with real-time
embedded systems dealing with readings of physical
variables (i.e. temperature, pressure, humidity, etc).
However, I think that the same arguments can be made
when dealing with other sets of data and with other
sets of rules.

>As I said in a previous post, it is not enough to simply develop a "fuzzy"
>measure of some system state. We need to know how the approximate reasoning
>or implication mechanism uses fuzzy logic to increase knowledge about the
>model or system state.
>
I agree.

> In my opinion, the fuzzy string matching machine
>tells us something about the error space between two strings but tells us
>nothing about the relationship between the two strings in a way that would
>be meaningful in a generalized machine reasoning context (although,
>obviously, it would be very helpful in a specific application concerned with
>error detection, such as spell checkers). This confusion of state
>measurement with knowledge is a common problem when we deal with ideas
>related to fuzzy logic.
>
I am not sure that it is a common problem, but it looks
an easy mistake to make. There is a tiny step, an invisible
boundary, that allows for an incorrect translation of concept
from one domain to another. Moreover, all reasoning about
"errors" and "error correction" is subject to a potential
trap: "the presumption of knowing the correct input".
[NOTE: people dealing with physical readings, i.e. oscilloscope
readings of small signals, often spend days chasing something
that does not exist, a signal that they assume is there
because they have seen something that makes them believe
that an event has happened].

> Confusing numerical with contextual, semantic, or
>labeled ambiguity is another case in point. If we ask, "How many bakers are
>in the army?" -- do we mean people with the name Baker or people with the
>MOS of Baker (MOS = military occupational specialty, a job type)? Many posts
>considering the infrastructure necessary to resolve these kinds of
>ambiguity.
>
I agree.

>In any case, in the cemetery vs cemetary case, we might ask "to what degree
>is X like Y" and in this one instance we receive a pretty good answer. But
>is the answer generalizable to a fuzzy string matching function? If we pick
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No way! At least, it cannot be done if we are considering one string
by itself. It could be approximated if we create a "crude" concept
of "context". We process an entire text, we identify the universe
of the words present in the text, we say that they form the
"numeric context" of the text. Then we can attempt to identify
"suggested spellings" for incorrect words. However, it is
primitive and error prone (semantically speaking).

>words at random from a dictionary and then evaluate them using the same
>function, what is the outcome? Even when we hit close strings, comparing
>"hat" and "cat" or "house" and "mouse" or "Street" and "strut" all we know
>is that we have two strings that are in general (or remote) proximity to
>each other in terms of the underlying lexicography. For a spell checker this
>is a good thing to know.
>
I think that it is not enough, that it should be augmented by
something like the above concept of "context".

> But if you are writing a general intelligent
>business system, with few specific and narrow exceptions, this kind of
>"fuzzy" metric is meaningless.
>
>Where is the fuzzy logic in this?
>
I agree.

>Naturally, this is my own opinion. The fact that I am infallible in my
>pronouncements about fuzzy logic, should not deter anyone from expressing
>their own conflicting opinions, however wrong they might be! ;-)
>
>Earl
>
The interesting (in an academic sense) problem is:
1) did the numeric aspect ever cover the semantics in unambiguous way?
2) if "YES" to (1), when and why where the two disconnected?
3) is there a language dependency?
4) are there languages that are closer to being crisp
(nearly boolean), with less ambiguity?

I have the impression that languages with declensions
(i.e. Latin, Ancient Greek, German, Polish, etc) are
less prone to ambiguity than languages without
declensions (i.e: English, Italian, etc.).

The not-so-academic result of these considerations
is that some applications may be easier when
implemented in a certain language rather than in
another. It already happens with speech synthesis
and with speech recognition (an easy task in Italian
and a very difficult task in English).

Regards

Bruno Di Stefano

-- Bruno Di Stefano -- Private: au843@torfree.net IEEE:b.distefano@ieee.org
Courses: stefano@ecf.toronto.edu http://www.ecf.toronto.edu/~stefano
Research: bruno@ecf.toronto.edu http://www.ecf.toronto.edu/~bruno
Consulting: alawnicz@sympatico.ca http://www3.sympatico.ca/alawnicz/nuptek.htm
-----------------------------------------------------------------------------

############################################################################
This message was posted through the fuzzy mailing list.
(1) To subscribe to this mailing list, send a message body of
"SUB FUZZY-MAIL myFirstName mySurname" to listproc@dbai.tuwien.ac.at
(2) To unsubscribe from this mailing list, send a message body of
"UNSUB FUZZY-MAIL" or "UNSUB FUZZY-MAIL yoursubscription@email.address.com"
to listproc@dbai.tuwien.ac.at
(3) To reach the human who maintains the list, send mail to
fuzzy-owner@dbai.tuwien.ac.at
(4) WWW access and other information on Fuzzy Sets and Logic see
http://www.dbai.tuwien.ac.at/ftp/mlowner/fuzzy-mail.info
(5) WWW archive: http://www.dbai.tuwien.ac.at/marchives/fuzzy-mail/index.html