Description:
Natural Language Processing (NLP) is a sub-field of Artificial Intelligence and Linguistics, with the aim of studying problems in the automatic generation and understanding of natural language. It involves identifying and exploiting linguistic rules and variation with code to translate unstructured language data into information with a schema. Empirical methods in NLP employ machine learning techniques to automatically extract linguistic knowledge from big textual data instead of hard-coding the necessary knowledge. Such intelligent machines require input data to be prepared in such a way that the computer can more easily find patterns and inferences. This is feasible by adding relevant metadata to a dataset. Any metadata tag used to mark up elements of the dataset is called an annotation over the input. In order for the algorithms to learn efficiently and effectively, the annotation done on the data must be accurate, and relevant to the task the machine is being asked to perform. In other words, the supervised machine learning methods intrinsically can not handle the inaccurate and noisy annotations and the performance of the learners have a high correlation with the quality of the input data labels. Hence, the annotations have to be prepared by experts. However, collecting labels for large dataset is impractical to perform by a small group of qualified experts or when the experts are unavailable. This is special crucial for the recent deep learning methods which the algorithms are starving for big supervised data. Crowdsourcing has emerged as a new paradigm for obtaining labels for training machine learning models inexpensively and for high level of data volume. The rationale behind this concept is to harness the “wisdom of the crowd†where groups of people pool their abilities to show collective intelligence. Although crowdsourcing is cheap and fast but collecting high quality data from the non-expert crowd requires careful attention to the task quality control management. The quality control process ...
Publisher:
University of Trento
Contributors:
Moschitti, Alessandro
Year of Publication:
2017-04-28
Document Type:
Doctoral Thesis ; NonPeerReviewed ; [Doctoral and postdoctoral thesis]
Subjects:
INF/01 INFORMATICA
Relations:
http://eprints-phd.biblio.unitn.it/2073/1/PhD-Thesis.pdf ; http://eprints-phd.biblio.unitn.it/2073/2/Disclaimer_Abadi.pdf ; Abad, Azad (2017) Controlling the effect of crowd noisy annotations in NLP Tasks. PhD thesis, University of Trento.
Content Provider:
Università degli Studi di Trento: Unitn-eprints.PhD
Further nameUniversity of Trento: Unitn-eprints.PhD  Flag of Italy