[PDF][PDF] Finding more bilingual webpages with high credibility via link analysis

C Zhang, X Yao, C Kit - Proceedings of the Sixth Workshop on …, 2013 - aclanthology.org
Proceedings of the Sixth Workshop on Building and Using Comparable …, 2013aclanthology.org
This paper presents an efficient approach to finding more bilingual webpage pairs with high
credibility via link analysis, using little prior knowledge or heuristics. It extends from a
previous algorithm that takes the number of bilingual URL pairs that a key (ie, a URL pairing
pattern) can match as the objective function to search for the best set of keys yielding the
greatest number of webpage pairs within targeted bilingual websites. Enhanced algorithms
are proposed to match more bilingual webpages following the credibility based on statistical …
Abstract
This paper presents an efficient approach to finding more bilingual webpage pairs with high credibility via link analysis, using little prior knowledge or heuristics. It extends from a previous algorithm that takes the number of bilingual URL pairs that a key (ie, a URL pairing pattern) can match as the objective function to search for the best set of keys yielding the greatest number of webpage pairs within targeted bilingual websites. Enhanced algorithms are proposed to match more bilingual webpages following the credibility based on statistical analysis of the link relationship of the seed websites available. With about 12,800 seed websites as test set, the enhanced algorithms improve precision over baseline by more than 5%, from 94.06% to 99.40%, and hence find above 20% more true bilingual URL pairs, illustrating that significantly more bilingual webpages with high credibility can be mined with the help of the link analysis.
aclanthology.org
Showing the best result for this search. See all results