[Date Prev][Date Next][Date Index]
Talk Announcement:Mini-Symposium "Data Extraction and Data Mappings"
Mini-Symposium ?Data Extraction and Darta Mappings?
Date: Monday, October 24th 2005, 14:00 ? 16:00
Location: Favoritenstraße 11, Groundfloor, red area, Zemanek Hörsaal
David W Embley:
Title: "Semantic Understanding: An Approach Based on Information Extraction
Abstract: Information is ubiquitous, and we're flooded with more than we
can process. Somehow, we must rely less on visual processing,
point-and-click navigation, and manual decision making and more on computer
sifting and organization of information and automated negotiation and
decision making. A resolution of these problems requires software agents
with semantic understanding---a grand challenge of our time. More
particularly, we must solve problems of information extraction, semantic
annotation, question answering, service request satisfaction, automated
interoperability, integration, and knowledge sharing. This talk addresses
aspects of these problems and suggests the use of data-extraction
ontologies as an approach that may help lead to semantic understanding.
Title: Composition of Mappings Given by Embedded Dependencies
Abstract: Composition of mappings between schemas is essential to support
schema evolution, data exchange, data integration, and other data
management tasks. In many applications, mappings are given by embedded
dependencies. In this paper, we study the issues involved in composing such
Our algorithms and results extend those of Fagin et al. [FKPT04] who
studied composition of mappings given by several kinds of constraints. In
particular, they proved that full source-to-target tuple-generating
dependenc ies (tgds) are closed under composition, but embedded
source-to-target tgds are not. They introduced a class of second-order
constraints, SO tgds, that is closed under composition and has desirable
properties for data exchange.
We study constraints that need not be source-to-target and we concentrate
on obtaining (first-order) embedded dependencies. As part of this study, we
also consider full dependencies and
second-order constraints that arise from Skolemizing embedded dependencies.
For each of the three classes of mappings that we study, we provide (a) an
algorithm that attempts to compute the composition and (b) sufficient
conditions on the input mappings that guarantee that the algorithm will
In addition, we give several negative results. In particular, we show that
full dependencies are not closed under composition, and that second-order
dependencies that are not limited to be source-to-target are not closed
under restricted composition. Furthermore, we show that determining whether
the composition can be given by these kinds of dependencies is undecidable.
Allright Group: Wolfgang Holzinger, Bernhard Krüpl
Title: "Project AllRight: a practical implementation of Web Information
Extraction, work in progress"
In this talk we present our work in progress in creating a Web information
extraction platform (project AllRight). AllRight tries to automatically
locate and extract data about products of a given domain described by a
product ontology. In the talk, we will concentrate on two specific sub
tasks of the project, namely the Information Retrieval (IR) and Information
Extraction (IE) stages:
- In the retrieval stage we operate in two stages: we first gather a
starting set of pages by querying an internet search engine with keywords
derived from the domain knowledge. Then we try to elaborate from this
starting set by using a web crawler that searches the neighborhood of those
initially found pages for similar content. We will present the algorithms
and heuristics used in this approach.
- In the extraction stage, we follow an unorthodox approach: rather than
analysing the HTML source code of a web page, we use a standard web browser
to render the page and take advantage of the spatial information of text
items as displayed on screen. We will give insights into this process and
explain why we believe it to be superior to traditional HTML based approaches.