Adaptive and/or scientific methods (was: More glottochronology (was: historical linguistics))

Cameron Laird (claird@Starbase.NeoSoft.COM)
22 Feb 1995 19:20:49 -0600


This post covers a lot of territory. In general, I think NetNews articles
are best when they say one thing well, but I've got a chunk of meat this
time that is best savored whole. Enjoy the ride.

In article <3i0p3d$hke@Starbase.NeoSoft.COM>, I announced:
> .
> .
> .
>For a more extended, but related, example of statistical
>reasoning in philology, see "An Algorithm for Reconstructing
>Language Families ...", which frequent s.l contributor
>Jacques Guy has kindly made available as
>
> ftp://ftp.neosoft.com/pub/users/claird/sci.anthropology/texts/jg_salish.zip
>
>Dr. Guy provides there "a measure of similarity directly
>computable from wordlists, allowing to bypass the process
>of cognate recognition", a result of his investigation
>into "the statistical properties of language families."
.
.
.
Some readers are having trouble with the "directly computable" part.
I'll present evidence and related themes to illustrate its plausi-
bility.

Dr. Guy's paper (which we're still updating sporadically for cor-
rection of mechanical errors) itself demonstrates that, quite often,

... complex, seemingly almost intractable
problems of automatic text analysis seem
to be reducible to much simpler models.

Certainly neural network researchers are familiar with the idea
that implementing an automated procedure, even one that is ignor-
ant about classical knowledge on a subject, can yield results.
The 10 February 1995 issue of *Science* marks some sort of water-
shed in the acceptance of such "non-parametric" or "adaptive"
approaches, for it includes at least four (!) articles on
adaptive techniques. "Galaxies, Human Eyes, and Artificial
Neural Networks" reports on progress in automating classification
of catalogued galaxies along the Revised Hubble T-type and other
scales. "Navigating Complex Labyrinths ..." exploits the far-
from-equilibrium Belousov-Zhabotinsky reaction to solve maze-escape
problems in geometric optimization that are combinatorially diffi-
cult with classical methods. Of interest to the largest
net.population is "Gauging Similarity with n-Grams: Language-Inde-
pendent Categorization of Text", which advances on information
retrieval problems of blind clustering (and, implicitly, of sorting,
categorization, and retrieval) using a very low-powered syntactic
theory, with no semantics.

Within AI, my own background and prejudices are toward structure-
or information-rich approaches. Moreover, the small amount of
formalized analysis available so far (see, for example,

http://e-math.ams.org/web/publications/bull/199501/199501017.html

on computational complexity of calculus problems) argues against
the superiority of adaptive methods even in cases where "common
sense" suggests they're warranted. For today, I'll suggest
that the growing empirical evidence for patterns of emergent
information is worth studying, and we all need to be comfortable
with the possibility that good engineering can involve fuzzi-
ness and other renunciations of strong semantic content. If I
read him correctly, Jorn Barger is constructing at

http://www.mcs.net/~jorn/home.html

one vision of where these themes can lead.

I've pointed follow-ups to sci.systems and comp.ai.

-- 

Cameron Laird http://starbase.neosoft.com/~claird/home.html claird@Neosoft.com +1 713 267 7966 claird@litwin.com +1 713 996 8546

------------------------------