Re: Question: Locating Trends in Data

WSiler (wsiler@aol.com)
Wed, 26 Aug 1998 20:57:06 +0200 (MET DST)

>Now what I'd eventually like to have is, for a stream of
>volume data, an output that says samples 1 to 100 are
>LOUD, samples 101 to 200 are VERY LOUD and
>samples 200 to 1000 are SOFT. Of course, not every
>sample in the first range is actually in the category
>LOUD, but the trend in 1-100 is loudness. The
>samples never stray from LOUD enough or for a long
>enough amount of time.
>Perhaps an enhancement would be to also get output >like: samples 100-107 go
from SOFT to LOUD (in other
>words, recognize states of transition).
>
>So how do I locate the places where one trend ends
>and another emerges? Would it be some function of
>the avg. and std. deviation? Ideally, this would be
>done with a variable tolerance for change, so that I can
>only look for large changes if I want. I'm thinking of
>some mechanism where I scan through it and
>somehow notice that I am definitely in a another
>region--then I backtrack until I find the first sample that
>fits better in the new category than the old one.
>That's how the dividing line will be drawn. But that's as
>far as I get.

A useful technique in this kind of application is that of the first order lag,
or an exponentially-mapped moving average. I'll repeat a previous posting, for
equally-spaced samples:

MAVGT(i) = (MAVGT(i-1) * T + x(i)) / (T + 1)

where T is a selected number of samples. The moving average is simple to
compute, and does two things for you. First, it smooths the data. Second, it
tends to represent the data T units ago, if the change is linear or close to it
over T samples. Selection of thje proper T is important, of course. Several
moving averages with different Ts can be maintained for different puposes.

What we do, of course, is to compare the current data, or the current data
smoothed with a very small T, to a value which represent the smoothed values
some time back. If a big enough change is detected, we have a candidate for a
new label, and note the time. We can then check how many times in the next
(short) period of time we meet the this criterion for a new value; if we meet
the criterion say 3 times out of four, then our new label can be considered
valid, and we have recorded the time at which the new label was valid enough to
be considered. We can also reinitialize our long-term moving average now to the
new situation.

There are all kinds of variations on this theme, of course, but the above is
one approach which works for us in a hospital intensive care unit.

Hope this helps - William Siler

############################################################################
This message was posted through the fuzzy mailing list.
(1) To subscribe to this mailing list, send a message body of
"SUB FUZZY-MAIL myFirstName mySurname" to listproc@dbai.tuwien.ac.at
(2) To unsubscribe from this mailing list, send a message body of
"UNSUB FUZZY-MAIL" or "UNSUB FUZZY-MAIL yoursubscription@email.address.com"
to listproc@dbai.tuwien.ac.at
(3) To reach the human who maintains the list, send mail to
fuzzy-owner@dbai.tuwien.ac.at
(4) WWW access and other information on Fuzzy Sets and Logic see
http://www.dbai.tuwien.ac.at/ftp/mlowner/fuzzy-mail.info
(5) WWW archive: http://www.dbai.tuwien.ac.at/marchives/fuzzy-mail/index.html