**Subject: **BISC: Abstract version of the Soccer Problem

**From: **Michelle T. Lin (*michlin@eecs.berkeley.edu*)

**Date: **Tue Aug 22 2000 - 16:37:03 MET DST

**sorted by:**[ date ] [ thread ] [ subject ] [ author ]**Next message:**n.o.s.p.a.m.-abc@def.ghi: "Re: Free S&P 500 Neural Prediction Website"**Previous message:**Alex Henderson: "Re: Q. distinction between 'Tall' and 'Tallest' ?"

*********************************************************************

Berkeley Initiative in Soft Computing (BISC)

*********************************************************************

To: The BISC Group

From: L. A. Zadeh <zadeh@cs.berkeley.edu>

Subject: Abstract version of the Soccer Problem

In a message to the BISC Group dated July 18, 2000, I posed a

problem which I characterized as a challenge to data miners -- a

problem labeled The Soccer Problem, or SP for short.

In an abstract formulation, what is seen more clearly is that

the soccer problem is an instance of class of problems which involve

what may be called exploratory hypothesis testing (EH testing). As

described in the following, the abstract problem is a simplified

version of the soccer problem but it preserves its main features. Here

is the problem.

Assume that the data consist of a collection, C, of N

sequences of length L of symbols drawn from a finite alphabet, A, of

size K. Each sequence is tagged with 0 and 1. What we have, then, is a

function or a relation, R, from C to {0,1}.

The question is: What is R? On the face of it, this appears to

be a standard problem in pattern recognition, neurocomputing, and

machine learning. The difficulty is that the number of given

sequences, N, is much too small in relation to L and K, making

standard techniques inapplicable.

To deal with insufficiency of data, we formulate a testable

hypothesis, H, and proceed to test it. For convenience, the approach

will be referred to as Exploratory Hypothesis testing or EH-testing,

for short.

The hypothesis is the following. Consider a subset of A, A*,

and let r(s) be the relative count of symbols in a sequence, s, which

belong to A*. For example, if A={a,b,c,d}, A*={a,b} and s=baacbdac,

then r=5/8. In this way, each sequence, s, in C is associated with an

ordered pair {r(s),v(s)}, where v(s) is 0 or 1. The hypothesis is:

Given the data: {r(s),v(s)}, s in C, as a function of A*,

there exists A* such that for most sequences, s, the larger the value

of r(s) the higher the probability (relative frequency) that v(s)=1.

Thus, if such A* does not exist, the hypothesis is wrong.

In summary, we have reduced the original data-mining problem

to that of exploratory hypothesis testing. It should be noted that a

significant difference between the soccer problem and its abstract

version as formulated in the foregoing discussion, is that in the

soccer problem the set A* is fuzzy rather than crisp.

Lotfi A. Zadeh

August 21 2000

Remark: Note that in the soccer problem and its abstract version the

hypothesis is fuzzy. What this suggests is that in most data-mining

problems the hypothesis must be fuzzy in order to be realistic.

To post your comments to the BISC Group, please send them to

me(zadeh@cs.berkeley.edu) with cc to Michael Berthold

(berthold@cs.berkeley.edu)

--------------------------------------------------------------------

If you ever want to remove yourself from this mailing list,

you can send mail to <Majordomo@EECS.Berkeley.EDU> with the following

command in the body of your email message:

unsubscribe bisc-group

or from another account,

unsubscribe bisc-group <your_email_adress>

############################################################################

This message was posted through the fuzzy mailing list.

(1) To subscribe to this mailing list, send a message body of

"SUB FUZZY-MAIL myFirstName mySurname" to listproc@dbai.tuwien.ac.at

(2) To unsubscribe from this mailing list, send a message body of

"UNSUB FUZZY-MAIL" or "UNSUB FUZZY-MAIL yoursubscription@email.address.com"

to listproc@dbai.tuwien.ac.at

(3) To reach the human who maintains the list, send mail to

fuzzy-owner@dbai.tuwien.ac.at

(4) WWW access and other information on Fuzzy Sets and Logic see

http://www.dbai.tuwien.ac.at/ftp/mlowner/fuzzy-mail.info

(5) WWW archive: http://www.dbai.tuwien.ac.at/marchives/fuzzy-mail/index.html

**Next message:**n.o.s.p.a.m.-abc@def.ghi: "Re: Free S&P 500 Neural Prediction Website"**Previous message:**Alex Henderson: "Re: Q. distinction between 'Tall' and 'Tallest' ?"

*
This archive was generated by hypermail 2b25
: Tue Aug 22 2000 - 16:54:01 MET DST
*