Applied Web Data Extraction and Integration

Lecture Overview
Number and Type: 181.189 VU SS 2014
Lecturer: Robert Baumgartner and Ruslan Fayzrakhmanov (exercises together with tutor Alexander Fischl)
Links: TISS | TUWEL
Selected Keywords: Overview about tools and methods for web data extraction and integration, Web Process Automation, Web Data for BI, Web Data Cleansing, Web Testing
Preliminary Meeting: Friday 7th of March, 16:00 (s.t.), EI 4 Reithoffer HS
Registration: Until 7th of March via TISS (limited participant number). Please de-register in TISS in case you decide not to take the course. Moreover, please register for one exercise group/unit via TUWEL. ECML students who can not yet register please write me a message to reserve a place for you.
Language: Slides in English, lecture language depending whether non-german speaking students join
Schedule: Friday 16:00-18:15 (Lecture 16:00, Exercises 17:00)
Timetable: 7.3, 14.3., 28.3., 11.4., 9.5., 23.5., 6.6., 13.6., 27.6.
Procedure: Lecture coupled with exercises and group work
Topics:
  • Web Data Extraction Frameworks and Scenarios: Commercial, Academic and Open Source
  • Data Integration and Mapping
  • Creation of more complex sample scenarios in some of the extraction/integration frameworks
  • Functional Web 2.0 Application Testing
  • Web Process Automation and SOA
  • Web ETL Connectors: Web Data for Business Intelligence
  • Sample Scenarios in vertical domains
  • Web Data Cleansing and Free Text Extraction
  • PDF Data Extraction
  • Elog Extraction Language
Fields of Study: This VU is a component of the curriculum of several master studies and is part of the European Master Programs Computational Logic.


Structure of the Lecture and Slides
Session Topics / Slides Date Lecture Time Lecture Location Exercises
1 Preliminary Meeting and Overview 7.3. 16:00-17:00 EI 4 Reithoffer HS -
2 Wrap the Web (6 in 1 | 1st Exercises) 14.3. 16:00-18:00 EI 4 Reithoffer HS -
3 Web Data Cleansing (6 in 1 | 2nd Exercises) 28.3. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
4 Entity Recognition and Opinion Mining (6 in 1 | 3rd Exercises) 11.4. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
5 Competitive Intelligence and Data Mining (6 in 1 | 4th Exercises) | Group Project Topics) 9.5. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
6 Functional Web Application Testing (6 in 1 | 5th Exercises) 23.5. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
7 Data Extraction for Web Accessibility (6 in 1 | 6th Exercises) 6.6. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
8 Data Extraction from PDF, Web Data Integration, Web Archiving (6 in 1 | Group Projects Agenda) 13.6. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
9 Group Project Presentations (Group Projects Download) 27.6. EI 4 Reithoffer HS 16:00-19:00
Logo of Lixto   Logo of Altova