Web Data Extraction and Integration

Lecture Overview
Number and Type: 181.130 VU WS 2011/12
Lecturer: Robert Baumgartner (exercises together with tutor Alexander Fischl)
Selected Keywords: Information extraction, approaches, tools and methods for wrapper generation, web querying, data integration, XML
Preliminary Meeting: Friday 7th of October, 16:00 (s.t.), EI 2 Pichlmayer HS
Registration: Until 6th of October via TISS (limited participant number). Please de-register in TISS in case you decide not to take the course. ECML students who can not yet register please write me a message to reserve a place for you.
Language: Slides in English, lecture language depending whether non-german speaking students join
Timetable: Selected Fridays 16:00-19:00 (see below for details; two exercise slots)
Procedure: Lecture coupled with exercises and group work
Topics:
  • Information Extraction: Setting, History, IE vs. IR
  • Structured Data Extraction and Wrapping
  • XML Transformation and Query Languages, DOM
  • Web Wrapper Languages
  • Wrapper Generation Tools
  • Wrappers for Mashups, SOA and BI
  • Inductive Wrapper Generation
  • Automatic Data Extraction / Web Data Mining
  • Supervised Wrapper Generation
  • Deep Web Navigation Approaches
  • Data Extraction from PDF documents
  • Mediation and Integration Approaches
  • Web Data Cleaning
  • Lixto Visual Wrapper and Transformation Server
Fields of Study: This VU is a component of the curriculum of several master studies and is part of the European Master Programs Computational Logic.


Structure of the Lecture and Slides
Nr. Session Topics / Slides Date Lecture Time Lecture Location Grp A* Exercises Grp B* Exercises
1st Preliminary Meeting + Session 1 Preliminary Meeting and Motivation/History Information Extraction (6 in 1) 7.10. 16:00-17:30 EI 2 Pichelmayer HS - -
2nd Session 2 XPath and XSLT (6 in 1 | Resources | 1st Exercises | Reference Solutions) 21.10. 16:00-18:00 EI 2 Pichelmayer HS - -
3rd Session 3 + Exercise Evaluation DOM and Approaches to Wrapper Generation (6 in 1 | Resources | 2nd Exercises) 4.11. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
4th Session 4 + Exercise Evaluation Tools for Web Information Extraction (6 in 1 | 3rd Exercises) 11.11. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
5th Session 5 + Exercise Evaluation Lixto Visual Developer (6 in 1 | 4th Exercises | Group Project Topics) 18.11. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
6th Session 6 + Exercise Evaluation Web Content Mining and Inductive Approaches (6 in 1 | 5th Exercises) 2.12. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
7th Session 7 + Exercise Evaluation Web Data Integration and Transformation (6 in 1 | 6th Exercises) 16.12. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
8th Session 8 + Exercise Evaluation Enabling Web Accessibility 13.1. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
9th/A Group Presentations (Group Project Topics) Groups A* (A1|A2|A3|A4|A5|A6) 20.1. - EI 2 Pichelmayer HS 16:00-19:00 -
9th/B Groups B* (B1|B2|B3|B4|B5) 27.1. - EI 2 Pichelmayer HS - 16:00-19:00
Logo of Lixto   Logo of Altova