Coreference detection of low quality objects
J Nielandt, A Bronselaer, G De Tré - … , IPMU 2012, Catania, Italy, July 9-13 …, 2012 - Springer
J Nielandt, A Bronselaer, G De Tré
Advances on Computational Intelligence: 14th International Conference on …, 2012•SpringerThe problem of record linkage is a widely studied problem that aims to identify coreferent (ie
duplicate) data in a structured data source. As indicated by Winkler, a solution to the record
linkage problem is only possible if the error rate is sufficiently low. In other words, in order to
successfully de-duplicate a database, the objects in the database must be of sufficient
quality. However, this assumption is not always feasible. In this paper, it is investigated how
merging of low quality objects into one high quality object can improve the process of record …
duplicate) data in a structured data source. As indicated by Winkler, a solution to the record
linkage problem is only possible if the error rate is sufficiently low. In other words, in order to
successfully de-duplicate a database, the objects in the database must be of sufficient
quality. However, this assumption is not always feasible. In this paper, it is investigated how
merging of low quality objects into one high quality object can improve the process of record …
Abstract
The problem of record linkage is a widely studied problem that aims to identify coreferent (i.e. duplicate) data in a structured data source. As indicated by Winkler, a solution to the record linkage problem is only possible if the error rate is sufficiently low. In other words, in order to successfully de-duplicate a database, the objects in the database must be of sufficient quality. However, this assumption is not always feasible. In this paper, it is investigated how merging of low quality objects into one high quality object can improve the process of record linkage. This general idea is illustrated in the context of strings comparison, where strings of low quality (i.e. with a high typographical error rate) are merged into a string of high quality by using an n-dimensional Levenshtein distance matrix and compute the optimal alignment between the dirty strings. Results are presented and possible refinements are proposed.
Springer
Showing the best result for this search. See all results