Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP

Silviu Paun, Dirk Hovy (Editors)

Anthology ID:
Hong Kong
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP
Silviu Paun | Dirk Hovy

pdf bib
Dependency Tree Annotation with Mechanical TurkMechanical Turk
Stephen Tratz

Crowdsourcing is frequently employed to quickly and inexpensively obtain valuable linguistic annotations but is rarely used for parsing, likely due to the perceived difficulty of the task and the limited training of the available workers. This paper presents what is, to the best of our knowledge, the first published use of Mechanical Turk (or similar platform) to crowdsource parse trees. We pay Turkers to construct unlabeled dependency trees for 500 English sentences using an interactive graphical dependency tree editor, collecting 10 annotations per sentence. Despite not requiring any training, several of the more prolific workers meet or exceed 90 % attachment agreement with the Penn Treebank (PTB) portion of our data, and, furthermore, for 72 % of these PTB sentences, at least one Turker produces a perfect parse. Thus, we find that, supported with a simple graphical interface, people with presumably no prior experience can achieve surprisingly high degrees of accuracy on this task. To facilitate research into aggregation techniques for complex crowdsourced annotations, we publicly release our annotated corpus.

pdf bib
Crowd-sourcing annotation of complex NLU tasks : A case study of argumentative content annotationNLU tasks: A case study of argumentative content annotation
Tamar Lavee | Lili Kotlerman | Matan Orbach | Yonatan Bilu | Michal Jacovi | Ranit Aharonov | Noam Slonim

Recent advancements in machine reading and listening comprehension involve the annotation of long texts. Such tasks are typically time consuming, making crowd-annotations an attractive solution, yet their complexity often makes such a solution unfeasible. In particular, a major concern is that crowd annotators may be tempted to skim through long texts, and answer questions without reading thoroughly. We present a case study of adapting this type of task to the crowd. The task is to identify claims in a several minute long debate speech. We show that sentence-by-sentence annotation does not scale and that labeling only a subset of sentences is insufficient. Instead, we propose a scheme for effectively performing the full, complex task with crowd annotators, allowing the collection of large scale annotated datasets. We believe that the encountered challenges and pitfalls, as well as lessons learned, are relevant in general when collecting data for large scale natural language understanding (NLU) tasks.

pdf bib
Computer Assisted Annotation of Tension Development in TED Talks through CrowdsourcingTED Talks through Crowdsourcing
Seungwon Yoon | Wonsuk Yang | Jong Park

We propose a method of machine-assisted annotation for the identification of tension development, annotating whether the tension is increasing, decreasing, or staying unchanged. We use a neural network based prediction model, whose predicted results are given to the annotators as initial values for the options that they are asked to choose. By presenting such initial values to the annotators, the annotation task becomes an evaluation task where the annotators inspect whether or not the predicted results are correct. To demonstrate the effectiveness of our method, we performed the annotation task in both in-house and crowdsourced environments. For the crowdsourced environment, we compared the annotation results with and without our method of machine-assisted annotation. We find that the results with our method showed a higher agreement to the gold standard than those without, though our method had little effect at reducing the time for annotation. Our codes for the experiment are made publicly available.

pdf bib
CoSSAT : Code-Switched Speech Annotation ToolCoSSAT: Code-Switched Speech Annotation Tool
Sanket Shah | Pratik Joshi | Sebastin Santy | Sunayana Sitaram

Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.