..::181.081 SS 2006::..
ProSeminar Web Information Extraction
Wissenschaftliches Arbeiten
Paper List
Wrapper Learning
Extracting web data using instance-based learning (2005) pdf
Knoblock, C.A., Lerman, K., Minton, S., Muslea, I.:
Accurately and reliably extracting data from the web: a machine learning approach (2003)
pdf
Muslea, I., Minton, S., Knoblock, C.:
STALKER: Learning extraction rules for semistructured, web-based information sources (1998)
pdf
Freitag, D., Kushmerick, N.:
Boosted wrapper induction (2000)
pdf
Chang, C-H., Lui, S-C.:
IEPAD: Information Extraction based on pattern discovery (2001)
pdf
Liu, B., Zhai, Y.:
NET: a system for extracting web data from flat and nested data records (2006)
pdf
Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.:
Fully automatic wrapper generation for search engines
pdf
Information extraction from printed documents and PDF files
M. Aiello, C. Monz, L. Todoran, M. Worring (2002):
Document understanding for a broad class of documents
pdf
O. Altamura, F. Esposito, D. Malerba (2001):
Transforming paper documents into XML format with WISDOM++
pdf
S. Klink, T. Kieninger (2001):
Rule-based Document Structure Understanding with a Fuzzy Combination of Layout and Textual Features
pdf
D. Russ, K. Summers (1994):
Geometric Algorithms and Experiments for Automated Document Structuring
ps.gz
pdf (converted)
W. Lovegrove, D. Brailsford (1995):
Document analysis of PDF files: methods, results and implications
pdf
Information extraction from tables
T. Kieninger (1998):
Table Structure Recognition Based On Robust Block Segmentation
ps.gz
pdf (converted)
Web Information Extraction
Ntoulas, Zerfos, Cho:
Downloading Hidden Web Content
pdf
Named Entity Recognition
Named entity recognition from diverse text types pdf
Zhou, G., Su, J.:
Named entity recognition using an hmm-based chunk tagger
pdf
Klein, D., Smarr, J. Nguyen, H., Manning, C.:
Named entity recognition with character-level models
pdf
Osenova, P., Kolkovska, S.:
Combining the named entity recognition task and NP chunking strategy for robust pre-processing
pdf
Soderland, S.:
Learning to extract text-based information from the world wide web (1997)
pdf