Web Data Extraction and Integration

Lecture Overview
Number and Type: 181.130 VU WS 2013/14
Lecturer: Robert Baumgartner (exercises together with tutor Alexander Fischl)
Selected Keywords: Information extraction, approaches, tools and methods for wrapper generation, web querying, data integration, XML
Preliminary Meeting: Friday 4th of October, 16:00 (s.t.), EI 2 Pichlmayer HS
Registration: Until 3rd of October via TISS (limited participant number). Please de-register in TISS in case you decide not to take the course. ECML students who can not yet register please write me a message to reserve a place for you.
Moreover, please register for one exercise group/unit via TUWEL.
Language: Slides in English, lecture language depending whether non-german speaking students join
Timetable: Selected Fridays 16:00-19:00 (A groups 16-17, lecture 17-18, B groups 18-19). Planned dates: 4.10., 18.10., 25.10., 8.11., 22.11., 6.12., 13.12., 10.1., 24.1./31.1. (Backup: 29.11.).
Procedure: Lecture coupled with exercises and group work
Topics:
  • Information Extraction: Setting, History, IE vs. IR
  • Structured Data Extraction and Wrapping
  • XML Transformation and Query Languages, DOM
  • Web Wrapper Languages
  • Wrapper Generation Tools
  • Wrappers for Mashups, SOA and BI
  • Inductive Wrapper Generation
  • Automatic Data Extraction / Web Data Mining
  • Supervised Wrapper Generation
  • Deep Web Navigation Approaches
  • Data Extraction from PDF documents
  • Mediation and Integration Approaches
  • Web Data Cleaning
  • Lixto Visual Wrapper and Transformation Server
Fields of Study: This VU is a component of the curriculum of several master studies and is part of the European Master Programs Computational Logic.


Structure of the Lecture and Slides
Session Topics / Slides Date Lecture Time Lecture Location Grp A* Exercises Grp B* Exercises
1 Preliminary Meeting and Motivation/History Information Extraction (6 in 1) 4.10. 16:00-17:15 EI 2 Pichelmayer HS - -
2 XPath and XSLT (6 in 1 | Resources | 1st Exercises) 18.10. 16:00-18:00 EI 2 Pichelmayer HS - -
3 DOM and approaches to wrapper generation (6 in 1 | Resources | 2nd Exercises) 25.10. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
4 Tools for Web Information Extraction (6 in 1 | 3rd Exercises) 8.11. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
5 Visual and interactive wrapper generation: Lixto Visual Developer (6 in 1 | 4th Exercises | Group Project Topics) 22.11. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
6 Spatial Extraction, Inductive and Automatic Extraction (6 in 1 | 5th Exercises) 6.12. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
7 Automatic Data Extraction and Web Data Integration (6 in 1 | 6th Exercises) 13.12. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
8 TamCrow Project / Enabling Web Accessibility (Group Projects Agenda) 10.1. 17:00-18:00 EI 2 Pichelmayer HS 16:00-17:00 18:00-19:00
9/A Group Presentations (Group Project Topics | Group Projects Agenda | Group Project Papers| Group Project Presentations) 24.1. - EI 2 Pichelmayer HS 16:00-19:00 -
9/B 31.1. - EI 2 Pichelmayer HS - 16:00-19:00
Logo of Lixto   Logo of Altova