BISC Seminar Announcement, Oct 30th, 4-5pm, 310 Soda Hall

Frank Hoffmann (
Mon, 27 Oct 1997 04:47:30 +0100 (MET)

Dear BISC group,

first of all I apologize that you received
each BISC seminar announcement twice recently. I will analyze
and address this problem with our mail administrator this week.

Frank Hoffmann

B I S C S e m i n a r A n n o u n c e m e n t

Untangling Text Data Mining

Marti Hearst

UC Berkeley SIMS

October 30th, 1997
310 Soda Hall


The focus of my research is on Information Access from text
collections. However, there has recently been a lot of talk about
something that sounds related: the nascent field of Text Data Mining
(TDM). Text data mining has the peculiar distinction of having a name
and a fair amount of hype but as yet almost no practitioners. I
suspect this happened because people assume TDM is a natural extension
of the slightly less nascent field of Data Mining (DM). It doesn't
help that there is also general disagreement about what constitutes
data mining. I define it as: the (semi)automated discovery of trends
and patterns across very large datasets for the purposes of decision

In this talk I will attempt to impose some order on the confusion. I
will discuss the relationships between TDM and DM, TDM and Information
Access, and TDM and computational linguistics. I will do this by
distinguishing among goal types (e.g., querying vs. summarizing,
prediction vs. description, and end-use vs. pre-processing),
techniques employed (e.g., clustering, summarization, categorization,
visualization), and data type used (no text, pure text, text plus
structure). I will then briefly describe a project that I plan to
begin soon which can be thought of as a hybrid of information access
and one take on text data mining.

Please direct questions with regard to the contents of the talk
and request for papers to the speaker.

Frank Hoffmann UC Berkeley
Computer Science Division Department of EECS
Email: phone: 1-510-642-8282
URL: fax: 1-510-642-5775