[PDF][PDF] Big data versus the crowd: Looking for relationships in all the right places

C Zhang, F Niu, C Ré, J Shavlik - … of the 50th Annual Meeting of …, 2012 - aclanthology.org
Proceedings of the 50th Annual Meeting of the Association for …, 2012aclanthology.org
Classically, training relation extractors relies on high-quality, manually annotated training
data, which can be expensive to obtain. To mitigate this cost, NLU researchers have
considered two newly available sources of less expensive (but potentially lower quality)
labeled data from distant supervision and crowd sourcing. There is, however, no study
comparing the relative impact of these two sources on the precision and recall of post-
learning answers. To fill this gap, we empirically study how state-of-the-art techniques are …
Abstract
Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have considered two newly available sources of less expensive (but potentially lower quality) labeled data from distant supervision and crowd sourcing. There is, however, no study comparing the relative impact of these two sources on the precision and recall of post-learning answers. To fill this gap, we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the corpus size for distant supervision has a statistically significant, positive impact on quality (F1 score). In contrast, human feedback has a positive and statistically significant, but lower, impact on precision and recall.
aclanthology.org
Showing the best result for this search. See all results