Re: Imbalanced Classes

Jonathan G Campbell (jg.campbell@ulst.ac.uk)
Sun, 14 Jun 1998 19:35:54 +0200 (MET DST)

Martijn van Otterlo wrote:
>
> Hi there,
>
> I'm working on a classification problem, but my main problem (up till
> now) is that I have 2 classes, one with about 15000 samples and the
> other with about 15 samples. Classification will put all patterns in the
> 'big' class and so does not make many mistakes. But I do not want that.
> Does anyone has any experience with this kind of problems?
>

Could I assume that the 15,000 samples refer to some `normal' class, and
that the 15 refer to some `abnormal' class? Could it be also that the 15
samples poorly represent possible diversity of `abnormal' patterns.

If you attempt to cast such a problem into a Bayesian formulation, you
will find that you are hopelessly deficient in:

- an estimate of prior probability,

- an estimate of the density p(features | abnormal).

And, respecting the newsgroup we are in, I think similar problems may
attend any `fuzzy' formulation.

Could you set up the problem as a hypothesis test, the (simple) null
hypothesis -- normal, versus the (possibly composite) alternative
hypothesis -- abnormal?

Look in a statistics or detection theory textbook for Neyman-Pearson
criterion. Or see e.g. C.W. Therrien, 1989, Decision, Estimation, and
Classification, Wiley, chapter 3.2.

I can elaborate, but I'd need to know more about your problem domain.

Best regards,

Jon Campbell

-- 
Jonathan G Campbell Univ. Ulster Magee College Derry BT48 7JL N. Ireland 
+44 1504 375367 JG.Campbell@ulst.ac.uk  http://www.infm.ulst.ac.uk/~jgc/

############################################################################ This message was posted through the fuzzy mailing list. (1) To subscribe to this mailing list, send a message body of "SUB FUZZY-MAIL myFirstName mySurname" to listproc@dbai.tuwien.ac.at (2) To unsubscribe from this mailing list, send a message body of "UNSUB FUZZY-MAIL" or "UNSUB FUZZY-MAIL yoursubscription@email.address.com" to listproc@dbai.tuwien.ac.at (3) To reach the human who maintains the list, send mail to fuzzy-owner@dbai.tuwien.ac.at (4) WWW access and other information on Fuzzy Sets and Logic see http://www.dbai.tuwien.ac.at/ftp/mlowner/fuzzy-mail.info (5) WWW archive: http://www.dbai.tuwien.ac.at/marchives/fuzzy-mail/index.html