English | Deutsch

Web data extraction and integration

Number and Type:

181130 VU WS 2005/06

Lecturer: Robert Baumgartner
Short Description: Approaches to web data extraction and integration
Preliminary Meeting: Thursday 6th of October, 10:00 (s.t.), seminar room 184/2
Registration: until 2nd of October via e-mail (limited participant number)
Language: Slides in English, lecture language depending whether non-german speaking students from the computational logic study join
Timetable: about every other Thursday 10:00-13:00 (starting with 20th of October), Seminarraum 184/2 (tw. geblockt)
Procedure: Lecture coupled with exercises and group work; Exercise evaluation at 10:00, lecture at 11:00 (on the first lecture day at 10:00)
Keywords: XML Family, XML Schema, XPath, XSLT, XQuery, (HTML) data extraction and wrapper generation, definition and areas in IE, differentation to IR, Lixto project: Visual Wrapper and Transformation Server, application generation with Lixto, other wrapper generation languages and -tools, wrapper learning und automatic data extraction, data aggregation and syndication, portal integration, e-biz Frameworks, pdf data extraction
Fields of Study: This VU is a compulsory course or compulsory elective in some bachelor and master studies and can furthermore be selected as part of KfK Semantic Web Advanced Topics and is part of the European Master Programs Computational Logic.
Related Lectures: Proseminar Web Information Extraction (Herzog, Gatterbauer)

Attention - Please note: Due to organisatorial issues the session planned for October 20 has been moved to November 3 and the second session moved to November 17 in consequence, and November 24 has been added for the third session.

Structure of the lecture and slides
Prelim.
Preliminary Meeting
6.10.
1st
Motivation IE, XML and XML Schema (till 12:30)
3.11.
2nd
XML Navigation, Query and Transformation Languages (+Ex.)
17.11.
3nd
XML Query Languages, techniques in IE, approaches to wrapper generation (+Ex.)
24.11.

4th

Lixto Visual Wrapper and Elog (+ Ex.)
1.12.
5th
Lixto Transformation Server (+Ex.)
15.12.
6th
Three sample projects: on inductive wrapper generation, automatic data extraction, and PDF data extraction (+Ex.)
12.1.
7th
Talks of Student Groups (10:00-12:30, 13:30-16:00; five groups each)
1|2|3|4|5|6|7|8|9|10
26.1.

Group Distribution Group Talk Topics&Timetable

Staff
Robert Baumgartner, last modified on 3/2/2006