Applied Web Data Extraction and Integration

Lecture Overview
Number and Type: 181.189 VU SS 2012
Lecturer: Robert Baumgartner (exercises together with tutor Alexander Fischl)
Selected Keywords: Overview about tools and methods for web data extraction and integration, Web Process Automation, Web Data for BI, Web Data Cleansing, Web Testing
Preliminary Meeting: Friday 9th of March, 16:00 (s.t.), EI 4 Reithoffer HS
Registration: Until 9th of March via TISS (limited participant number). Please de-register in TISS in case you decide not to take the course. ECML students who can not yet register please write me a message to reserve a place for you.
Language: Slides in English, lecture language depending whether non-german speaking students join
Schedule: Friday 16:00-18:15 (Lecture 16:00, Exercises 17:00)
Timetable: 09/03, 16/03, 30/03, 20/04, 27/04, 11/05, 25/05, 01/06, 22/06, (Backup: 29/06)
Procedure: Lecture coupled with exercises and group work
Topics:
  • Web Data Extraction Frameworks and Scenarios: Commercial, Academic and Open Source
  • Data Integration and Mapping
  • Creation of more complex sample scenarios in some of the extraction/integration frameworks
  • Functional Web 2.0 Application Testing
  • Web Process Automation and SOA
  • Web ETL Connectors: Web Data for Business Intelligence
  • Sample Scenarios in vertical domains
  • Web Data Cleansing and Free Text Extraction
  • PDF Data Extraction
  • Elog Extraction Language
Fields of Study: This VU is a component of the curriculum of several master studies and is part of the European Master Programs Computational Logic.


Structure of the Lecture and Slides
Session Topics / Slides Date Lecture Time Lecture Location Exercises
1 Preliminary Meeting and Overview 9.3. 16:00-17:00 EI 4 Reithoffer HS -
2 Wrap the Web (6 in 1 | 1st Exercises) 16.3. 16:00-18:00 EI 4 Reithoffer HS -
3 Wrapper Languages (6 in 1 | 2nd Exercises) 30.3. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:15
4 Data Cleansing (6 in 1 | 3rd Exercises) 20.4. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:15
5 Functional Web Application Testing (6 in 1 | 4th Exercises | Group Project Topics) 27.4. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:15
6 Competitive Intelligence and Data Mining (6 in 1 | 5th Exercises) 11.5. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:15
7 Web Process Integration and Web Archiving (6 in 1 | 6th Exercises) 25.5. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:15
8 T. Hassan: PDF Data Extraction and Document Understanding (6 in 1 | Group Projects Agenda for June 22) 1.6. 17:15-18:15 EI 4 Reithoffer HS 16:00-17:15
9 Group Project Presentations (Updated Agenda):
G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10
22.6. EI 4 Reithoffer HS 16:00-20:00
Logo of Lixto   Logo of Altova