Re: Imbalanced Classes

Daniel Fischer (fischerd@rd.hydro.on.ca)
Sun, 14 Jun 1998 22:04:29 +0200 (MET DST)

Martijn van Otterlo wrote:
>
> Hi there,
>
> I'm working on a classification problem, but my main problem (up till
> now) is that I have 2 classes, one with about 15000 samples and the
> other with about 15 samples. Classification will put all patterns in the
> 'big' class and so does not make many mistakes. But I do not want that.
> Does anyone has any experience with this kind of problems?
>
> Thanx,
>
> Martijn van Otterlo,
> Student Computer Science, University of Twente, The Netherlands.
I would look into k-means clustering and modifying it somehow, in order
to prevent the class with many samples from grabing both cluster
centers, and being split into 2. I think I would compute the average
sample (which would be the center of gravity of the class with many
samples) and initialize class 1 in the K-means algorithm with that
value, and modify the k-means to punish the other cluster if it moves
closer to cluster 1 than the average radius of class 1, sort of thing...
You and I could do the clustering, by looking at the points, and there
isn't anything that enough "if ... then... " statements cannot fix :-)

Please let me know what you come up with.

Daniel Fischer

############################################################################
This message was posted through the fuzzy mailing list.
(1) To subscribe to this mailing list, send a message body of
"SUB FUZZY-MAIL myFirstName mySurname" to listproc@dbai.tuwien.ac.at
(2) To unsubscribe from this mailing list, send a message body of
"UNSUB FUZZY-MAIL" or "UNSUB FUZZY-MAIL yoursubscription@email.address.com"
to listproc@dbai.tuwien.ac.at
(3) To reach the human who maintains the list, send mail to
fuzzy-owner@dbai.tuwien.ac.at
(4) WWW access and other information on Fuzzy Sets and Logic see
http://www.dbai.tuwien.ac.at/ftp/mlowner/fuzzy-mail.info
(5) WWW archive: http://www.dbai.tuwien.ac.at/marchives/fuzzy-mail/index.html