Re: Data Metrics

Scott Ferson (scott@ramas.com)
Sun, 26 Jul 1998 16:20:58 +0200 (MET DST)

I think you'll find the text "Numerical Taxonomy" by P.H.A.
Sneath and R.R. Sokal, published by Freeman, to be very useful.
Numerical taxonomists, especially the so-called "pheneticists",
have long been concerned with metrics and similarity measures
for use in characterizing the relationships among different
biological species.

The characteristics they measure on specimens are sometimes
real-valued but often not, and include counts, rank and nominal
data. They've thought extensively about the problem of how to
combine these different kinds of data into a single measure.
Their ideas of "overall similarity" might be perfect for your
application. (They've been less than perfect for taxonomy
itself where relatedness has more to do with phylogenetic
descent than with overall similarity.)

Contact me directly for a more basic introduction.

Scott Ferson <scott@ramas.com>
Applied Biomathematics, 516-751-4350, fax -3435

Camilo wrote:

> Hi Will,
>
> I have been working on a general theory that would include all kinds of
> data types. Could you tell me about a book or papers in which these topics
> are studied at length?
>
> 1._ "distance metrics" and "similarity measures" in multivariate statistics
> 2._ Euclidean distance (also known as geometric or L-1 distance),
> 3._ Manhattan distance (also known as city-block, absolute, maximum or
> L-infinity distance), and
> 4._ Mahalanobis distance.
> 5._ Purely nominal data is sometimes compared using string distances like
> the Levenshtein distance.
>
> Will Dwinnell <76743.1740@CompuServe.COM> wrote in message ...
>
> >This is a well-explored subject for purely numeric data. For
> >mixed numeric/nominal or purely nominal data, there is less
> >information. With numeric data, look for "distance metrics" and
> >"similarity measures" in multivariate statistics. Some common
> >examples (which may aid in a keyword search) are Euclidean
> >distance (also known as geometric or L-1 distance), Manhattan
> >distance (also known as city-block, absolute, maximum or
> >L-infinity distance) and Mahalanobis distance. Purely nominal
> >data is sometimes compared using string distances like the
> >Levenshtein distance.
> >
> >--
> >Will Dwinnell

############################################################################
This message was posted through the fuzzy mailing list.
(1) To subscribe to this mailing list, send a message body of
"SUB FUZZY-MAIL myFirstName mySurname" to listproc@dbai.tuwien.ac.at
(2) To unsubscribe from this mailing list, send a message body of
"UNSUB FUZZY-MAIL" or "UNSUB FUZZY-MAIL yoursubscription@email.address.com"
to listproc@dbai.tuwien.ac.at
(3) To reach the human who maintains the list, send mail to
fuzzy-owner@dbai.tuwien.ac.at
(4) WWW access and other information on Fuzzy Sets and Logic see
http://www.dbai.tuwien.ac.at/ftp/mlowner/fuzzy-mail.info
(5) WWW archive: http://www.dbai.tuwien.ac.at/marchives/fuzzy-mail/index.html