Wrapper Induction by XPath Alignment

Joachim Nielandt; Robin de Mol; Antoon Bronselaer; Guy de Tré

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Wrapper Induction by XPath Alignment

Topics: Fuzzy Information Retrieval and Data Mining; Information Extraction; Interactive and Online Data Mining; Mining Text and Semi-Structured Data; Web Mining

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 0IC3K, 492-500, 2014 , Rome, Italy

Authors: Joachim Nielandt ; Robin de Mol ; Antoon Bronselaer and Guy de Tré

Affiliation: Ghent University, Belgium

Keyword(s): Wrapper Induction, XPath, Alignment, Data Extraction, DOM.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Computational Intelligence ; Fuzzy Information Retrieval and Data Mining ; Fuzzy Systems ; Information Extraction ; Interactive and Online Data Mining ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Mining Text and Semi-Structured Data ; Soft Computing ; Symbolic Systems ; Web Mining

Abstract: Dealing with a huge quantity of semi-structured documents and the extraction of information therefrom is an important topic that is getting a lot of attention. Methods that allow to accurately define where the data can be found are then pivotal in constructing a robust solution, allowing for imperfections and structural changes in the source material. In this paper we investigate a wrapper induction method that revolves around aligning XPath elements (steps), allowing a user to generalise upon training examples he gives to the data extraction system. The alignment is based on a modification of the well known Levenshtein edit distance. When the training example XPaths have been aligned with each other they are subsequently merged into the path that generalises, as precise as possible, the examples, so it can be used to accurately fetch the required data from the given source material.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.15.218.182

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Nielandt, J., de Mol, R., Bronselaer, A. and de Tré, G. (2014). Wrapper Induction by XPath Alignment. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - KDIR; ISBN 978-989-758-048-2; ISSN 2184-3228, SciTePress, pages 492-500. DOI: 10.5220/0005124504920500

@conference{kdir14,
author={Joachim Nielandt and Robin {de Mol} and Antoon Bronselaer and Guy {de Tré}},
title={Wrapper Induction by XPath Alignment},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - KDIR},
year={2014},
pages={492-500},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005124504920500},
isbn={978-989-758-048-2},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (IC3K 2014) - KDIR
TI - Wrapper Induction by XPath Alignment
SN - 978-989-758-048-2
IS - 2184-3228
AU - Nielandt, J.
AU - de Mol, R.
AU - Bronselaer, A.
AU - de Tré, G.
PY - 2014
SP - 492
EP - 500
DO - 10.5220/0005124504920500
PB - SciTePress