Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

Jill Burstein, Andrea Horbach, Ekaterina Kochmar, Ronja Laarmann-Quante, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Helen Yannakoudakis, Torsten Zesch (Editors)

Anthology ID:
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Jill Burstein | Andrea Horbach | Ekaterina Kochmar | Ronja Laarmann-Quante | Claudia Leacock | Nitin Madnani | Ildikó Pilán | Helen Yannakoudakis | Torsten Zesch

pdf bib
Text Simplification by TaggingText Simplification by Tagging
Kostiantyn Omelianchuk | Vipul Raheja | Oleksandr Skurzhanskyi

Edit-based approaches have recently shown promising results on multiple monolingual sequence transduction tasks. In contrast to conventional sequence-to-sequence (Seq2Seq) models, which learn to generate text from scratch as they are trained on parallel corpora, these methods have proven to be much more effective since they are able to learn to make fast and accurate transformations while leveraging powerful pre-trained language models. Inspired by these ideas, we present TST, a simple and efficient Text Simplification system based on sequence Tagging, leveraging pre-trained Transformer-based encoders. Our system makes simplistic data augmentations and tweaks in training and inference on a pre-existing system, which makes it less reliant on large amounts of parallel training data, provides more control over the outputs and enables faster inference speeds. Our best model achieves near state-of-the-art performance on benchmark test datasets for the task. Since it is fully non-autoregressive, it achieves faster inference speeds by over 11 times than the current state-of-the-art text simplification system.

pdf bib
Broad Linguistic Complexity Analysis for Greek Readability ClassificationGreek Readability Classification
Savvas Chatzipanagiotidis | Maria Giagkou | Detmar Meurers

This paper explores the linguistic complexity of Greek textbooks as a readability classification task. We analyze textbook corpora for different school subjects and textbooks for Greek as a Second Language, covering a very wide spectrum of school age groups and proficiency levels. A broad range of quantifiable linguistic complexity features (lexical, morphological and syntactic) are extracted and calculated. Conducting experiments with different feature subsets, we show that the different linguistic dimensions contribute orthogonal information, each contributing towards the highest result achieved using all linguistic feature subsets. A readability classifier trained on this basis reaches a classification accuracy of 88.16 % for the Greek as a Second Language corpus. To investigate the generalizability of the classification models, we also perform cross-corpus evaluations. We show that the model trained on the most varied text collection (for Greek as a school subject) generalizes best. In addition to advancing the state of the art for Greek readability analysis, the paper also contributes insights on the role of different feature sets and training setups for generalizable readability classification.

pdf bib
Parsing Argumentative Structure in English-as-Foreign-Language EssaysEnglish-as-Foreign-Language Essays
Jan Wira Gotama Putra | Simone Teufel | Takenobu Tokunaga

This paper presents a study on parsing the argumentative structure in English-as-foreign-language (EFL) essays, which are inherently noisy. The parsing process consists of two steps, linking related sentences and then labelling their relations. We experiment with several deep learning architectures to address each task independently. In the sentence linking task, a biaffine model performed the best. In the relation labelling task, a fine-tuned BERT model performed the best. Two sentence encoders are employed, and we observed that non-fine-tuning models generally performed better when using Sentence-BERT as opposed to BERT encoder. We trained our models using two types of parallel texts : original noisy EFL essays and those improved by annotators, then evaluate them on the original essays. The experiment shows that an end-to-end in-domain system achieved an accuracy of.341. On the other hand, the cross-domain system achieved 94 % performance of the in-domain system. This signals that well-written texts can also be useful to train argument mining system for noisy texts.

pdf bib
Training and Domain Adaptation for Supervised Text Segmentation
Goran Glavaš | Ananya Ganesh | Swapna Somasundaran

Unlike traditional unsupervised text segmentation methods, recent supervised segmentation models rely on Wikipedia as the source of large-scale segmentation supervision. These models have, however, predominantly been evaluated on the in-domain (Wikipedia-based) test sets, preventing conclusions about their general segmentation efficacy. In this work, we focus on the domain transfer performance of supervised neural text segmentation in the educational domain. To this end, we first introduce K12Seg, a new dataset for evaluation of supervised segmentation, created from educational reading material for grade-1 to college-level students. We then benchmark a hierarchical text segmentation model (HITS), based on RoBERTa, in both in-domain and domain-transfer segmentation experiments. While HITS produces state-of-the-art in-domain performance (on three Wikipedia-based test sets), we show that, subject to the standard full-blown fine-tuning, it is susceptible to domain overfitting. We identify adapter-based fine-tuning as a remedy that substantially improves transfer performance.

pdf bib
C-Test Collector : A Proficiency Testing Application to Collect Training Data for C-TestsC-Test Collector: A Proficiency Testing Application to Collect Training Data for C-Tests
Christian Haring | Rene Lehmann | Andrea Horbach | Torsten Zesch

We present the C-Test Collector, a web-based tool that allows language learners to test their proficiency level using c-tests. Our tool collects anonymized data on test performance, which allows teachers to gain insights into common error patterns. At the same time, it allows NLP researchers to collect training data for being able to generate c-test variants at the desired difficulty level.

pdf bib
Sharks are not the threat humans are : Argument Component Segmentation in School Student Essays
Tariq Alhindi | Debanjan Ghosh

Argument mining is often addressed by a pipeline method where segmentation of text into argumentative units is conducted first and proceeded by an argument component identification task. In this research, we apply a token-level classification to identify claim and premise tokens from a new corpus of argumentative essays written by middle school students. To this end, we compare a variety of state-of-the-art models such as discrete features and deep learning architectures (e.g., BiLSTM networks and BERT-based architectures) to identify the argument components. We demonstrate that a BERT-based multi-task learning architecture (i.e., token and sentence level classification) adaptively pretrained on a relevant unlabeled dataset obtains the best results.