..::181.081 SS 2006::..

ProSeminar Web Information Extraction

Wissenschaftliches Arbeiten


HOME :: SCHEDULE :: PAPERS :: WIKI



Paper List


Wrapper Learning

Zhai, Y., Liu, B.:
Extracting web data using instance-based learning (2005) pdf

Knoblock, C.A., Lerman, K., Minton, S., Muslea, I.:
Accurately and reliably extracting data from the web: a machine learning approach (2003) pdf

Muslea, I., Minton, S., Knoblock, C.:
STALKER: Learning extraction rules for semistructured, web-based information sources (1998) pdf

Freitag, D., Kushmerick, N.:
Boosted wrapper induction (2000) pdf

Chang, C-H., Lui, S-C.:
IEPAD: Information Extraction based on pattern discovery (2001) pdf

Liu, B., Zhai, Y.:
NET: a system for extracting web data from flat and nested data records (2006) pdf

Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.:
Fully automatic wrapper generation for search engines pdf

up

Information extraction from printed documents and PDF files

M. Aiello, C. Monz, L. Todoran, M. Worring (2002):
Document understanding for a broad class of documents pdf

O. Altamura, F. Esposito, D. Malerba (2001):
Transforming paper documents into XML format with WISDOM++ pdf

S. Klink, T. Kieninger (2001):
Rule-based Document Structure Understanding with a Fuzzy Combination of Layout and Textual Features pdf

D. Russ, K. Summers (1994):
Geometric Algorithms and Experiments for Automated Document Structuring ps.gz pdf (converted)

W. Lovegrove, D. Brailsford (1995):
Document analysis of PDF files: methods, results and implications pdf

up

Information extraction from tables

T. Kieninger (1998):
Table Structure Recognition Based On Robust Block Segmentation ps.gz pdf (converted)

up

Web Information Extraction

Ntoulas, Zerfos, Cho:
Downloading Hidden Web Content pdf

up

Named Entity Recognition

Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Yorick:
Named entity recognition from diverse text types pdf

Zhou, G., Su, J.:
Named entity recognition using an hmm-based chunk tagger pdf

Klein, D., Smarr, J. Nguyen, H., Manning, C.:
Named entity recognition with character-level models pdf Osenova, P., Kolkovska, S.:
Combining the named entity recognition task and NP chunking strategy for robust pre-processing pdf

Soderland, S.:
Learning to extract text-based information from the world wide web (1997) pdf

up