BISC Seminar Announcement, Oct 30th, 4-5pm, 310 Soda Hall

Frank Hoffmann (fhoffman@cs.berkeley.edu)
Mon, 27 Oct 1997 04:47:30 +0100 (MET)

Dear BISC group,

first of all I apologize that you received
each BISC seminar announcement twice recently. I will analyze
and address this problem with our mail administrator this week.

Frank Hoffmann

***************************************************************
B I S C S e m i n a r A n n o u n c e m e n t
***************************************************************

Untangling Text Data Mining

Marti Hearst

UC Berkeley SIMS
hearst@info.sims.berkeley.edu

October 30th, 1997
4-5pm
310 Soda Hall


Abstract

The focus of my research is on Information Access from text
collections. However, there has recently been a lot of talk about
something that sounds related: the nascent field of Text Data Mining
(TDM). Text data mining has the peculiar distinction of having a name
and a fair amount of hype but as yet almost no practitioners. I
suspect this happened because people assume TDM is a natural extension
of the slightly less nascent field of Data Mining (DM). It doesn't
help that there is also general disagreement about what constitutes
data mining. I define it as: the (semi)automated discovery of trends
and patterns across very large datasets for the purposes of decision
making.

In this talk I will attempt to impose some order on the confusion. I
will discuss the relationships between TDM and DM, TDM and Information
Access, and TDM and computational linguistics. I will do this by
distinguishing among goal types (e.g., querying vs. summarizing,
prediction vs. description, and end-use vs. pre-processing),
techniques employed (e.g., clustering, summarization, categorization,
visualization), and data type used (no text, pure text, text plus
structure). I will then briefly describe a project that I plan to
begin soon which can be thought of as a hybrid of information access
and one take on text data mining.

**********************************************************************
Please direct questions with regard to the contents of the talk
and request for papers to the speaker.
**********************************************************************

---------------------------------------------------------------------------
Frank Hoffmann UC Berkeley
Computer Science Division Department of EECS
Email: fhoffman@cs.berkeley.edu phone: 1-510-642-8282
URL: http://http.cs.berkeley.edu/~fhoffman fax: 1-510-642-5775
---------------------------------------------------------------------------