pdf
bib
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications
Yuen-Hsien Tseng
|
Hsin-Hsi Chen
|
Vincent Ng
|
Mamoru Komachi
pdf
bib
abs
Feature Optimization for Predicting Readability of Arabic L1 and L2Arabic L1 and L2
Hind Saddiki
|
Nizar Habash
|
Violetta Cavalli-Sforza
|
Muhamed Al Khalil
Advances in automatic readability assessment can impact the way people consume information in a number of domains. Arabic, being a low-resource and morphologically complex language, presents numerous challenges to the task of automatic readability assessment. In this paper, we present the largest and most in-depth computational readability study for
Arabic to date. We study a large set of
features with varying depths, from shallow words to syntactic trees, for both L1 and L2 readability tasks. Our best L1 readability accuracy result is 94.8 % (75 %
error reduction from a commonly used baseline). The comparable results for L2 are 72.4 % (45 % error reduction). We also demonstrate the added value of leveraging L1 features for L2 readability prediction.
pdf
bib
abs
A Tutorial Markov Analysis of Effective Human Tutorial SessionsMarkov Analysis of Effective Human Tutorial Sessions
Nabin Maharjan
|
Vasile Rus
This paper investigates what differentiates effective tutorial sessions from less effective sessions. Towards this end, we characterize and explore human tutors’ actions in tutorial dialogue sessions by mapping the tutor-tutee interactions, which are streams of dialogue utterances, into streams of actions, based on the language-as-action theory. Next, we use human expert judgment measures, evidence of learning (EL) and evidence of soundness (ES), to identify effective and ineffective sessions. We perform sub-sequence pattern mining to identify sub-sequences of dialogue modes that discriminate good sessions from bad sessions. We finally use the results of sub-sequence analysis method to generate a tutorial Markov process for effective tutorial sessions.
pdf
bib
abs
Thank Goodness ! A Way to Measure Style in Student Essays
Sandeep Mathias
|
Pushpak Bhattacharyya
Essays have two major components for scoring-content and style. In this paper, we describe a property of the essay, called goodness, and use it to predict the score given for the style of student essays. We compare our approach to solve this problem with baseline approaches, like
language modeling and also a state-of-the-art deep learning system. We show that, despite being quite intuitive, our approach is very powerful in predicting the style of the essays.
pdf
bib
abs
Overview of NLPTEA-2018 Share Task Chinese Grammatical Error DiagnosisNLPTEA-2018 Share Task Chinese Grammatical Error Diagnosis
Gaoqi Rao
|
Qi Gong
|
Baolin Zhang
|
Endong Xun
This paper presents the NLPTEA 2018 shared task for
Chinese Grammatical Error Diagnosis (CGED) which seeks to identify grammatical error types, their range of occurrence and recommended corrections within sentences written by learners of
Chinese as foreign language. We describe the task definition, data preparation,
performance metrics, and evaluation results. Of the 20 teams registered for this shared task, 13 teams developed the
system and submitted a total of 32 runs. Progress in
system performances was obviously, reaching F1 of 36.12 % in position level and 25.27 % in correction level. All data sets with gold standards and scoring scripts are made publicly available to researchers.
pdf
bib
abs
A Hybrid System for Chinese Grammatical Error Diagnosis and CorrectionChinese Grammatical Error Diagnosis and Correction
Chen Li
|
Junpei Zhou
|
Zuyi Bao
|
Hengyou Liu
|
Guangwei Xu
|
Linlin Li
This paper introduces the DM_NLP team’s system for NLPTEA 2018 shared task of
Chinese Grammatical Error Diagnosis (CGED), which can be used to detect and correct grammatical errors in texts written by
Chinese as a Foreign Language (CFL) learners. This task aims at not only detecting four types of grammatical errors including redundant words (R), missing words (M), bad word selection (S) and disordered words (W), but also recommending corrections for errors of M and S types. We proposed a
hybrid system including four
models for this task with two stages : the detection stage and the correction stage. In the detection stage, we first used a BiLSTM-CRF model to tag potential errors by
sequence labeling, along with some handcraft features. Then we designed three Grammatical Error Correction (GEC) models to generate corrections, which could help to tune the detection result. In the correction stage, candidates were generated by the three GEC models and then merged to output the final corrections for M and S types. Our system reached the highest
precision in the correction subtask, which was the most challenging part of this shared task, and got top 3 on F1 scores for position detection of errors.
pdf
bib
abs
Ling@CASS Solution to the NLP-TEA CGED Shared Task 2018CASS Solution to the NLP-TEA CGED Shared Task 2018
Qinan Hu
|
Yongwei Zhang
|
Fang Liu
|
Yueguo Gu
In this study, we employ the sequence to sequence learning to model the task of
grammar error correction. The
system takes potentially erroneous sentences as inputs, and outputs correct sentences. To breakthrough the bottlenecks of very limited size of manually labeled data, we adopt a
semi-supervised approach. Specifically, we adapt correct sentences written by native Chinese speakers to generate pseudo grammatical errors made by learners of
Chinese as a second language. We use the pseudo data to pre-train the
model, and the CGED data to fine-tune it. Being aware of the significance of
precision in a grammar error correction system in real scenarios, we use
ensembles to boost
precision. When using inputs as simple as
Chinese characters, the ensembled system achieves a
precision at 86.56 % in the detection of erroneous sentences, and a
precision at 51.53 % in the correction of errors of Selection and Missing types.
pdf
bib
abs
Chinese Grammatical Error Diagnosis Based on Policy Gradient LSTM ModelChinese Grammatical Error Diagnosis Based on Policy Gradient LSTM Model
Changliang Li
|
Ji Qi
Chinese Grammatical Error Diagnosis (CGED) is a natural language processing task for the NLPTEA2018 workshop held during ACL2018. The goal of this task is to diagnose Chinese sentences containing four kinds of
grammatical errors through the
model and find out the sentence errors. Chinese grammatical error diagnosis system is a very important tool, which can help
Chinese learners automatically diagnose
grammatical errors in many scenarios. However, due to the limitations of the Chinese language’s own characteristics and datasets, the traditional
model faces the problem of extreme imbalances in the positive and negative samples and the disappearance of gradients. In this paper, we propose a sequence labeling method based on the Policy Gradient LSTM model and apply it to this task to solve the above problems. The results show that our
model can achieve higher precision scores in the case of lower False positive rate (FPR) and it is convenient to optimize the
model on-line.
pdf
bib
abs
Joint learning of frequency and word embeddings for multilingual readability assessment
Dieu-Thu Le
|
Cam-Tu Nguyen
|
Xiaoliang Wang
This paper describes two models that employ word frequency embeddings to deal with the problem of readability assessment in multiple languages. The task is to determine the difficulty level of a given document, i.e., how hard it is for a reader to fully comprehend the text. The proposed
models show how
frequency information can be integrated to improve the readability assessment. The experimental results testing on both English and Chinese datasets show that the proposed
models improve the results notably when comparing to those using only traditional
word embeddings.
pdf
bib
abs
MULLE : A grammar-based Latin language learning tool to supplement the classroom settingMULLE: A grammar-based Latin language learning tool to supplement the classroom setting
Herbert Lange
|
Peter Ljunglöf
MULLE is a tool for
language learning that focuses on teaching Latin as a foreign language. It is aimed for easy integration into the traditional classroom setting and syllabus, which makes
it distinct from other
language learning tools that provide standalone learning experience. It uses grammar-based lessons and embraces methods of
gamification to improve the learner motivation. The main type of exercise provided by our
application is to practice
translation, but it is also possible to shift the focus to vocabulary or morphology training.
pdf
bib
abs
Textual Features Indicative of Writing Proficiency in Elementary School Spanish DocumentsSpanish Documents
Gemma Bel-Enguix
|
Diana Dueñas Chávez
|
Arturo Curiel Díaz
Childhood acquisition of written language is not straightforward. Writing skills evolve differently depending on external factors, such as the conditions in which children practice their productions and the quality of their instructors’ guidance. This can be challenging in low-income areas, where schools may struggle to ensure ideal acquisition conditions. Developing computational tools to support the
learning process may counterweight negative environmental influences ; however, few work exists on the use of
information technologies to improve childhood literacy. This work centers around the computational study of Spanish word and syllable structure in documents written by 2nd and 3rd year elementary school students. The studied texts were compared against a corpus of short stories aimed at the same age group, so as to observe whether the children tend to produce similar written patterns as the ones they are expected to interpret at their
literacy level. The obtained results show some significant differences between the two kinds of texts, pointing towards possible strategies for the implementation of new
education software in support of written language acquisition.
pdf
bib
abs
A Short Answer Grading System in Chinese by Support Vector ApproachChinese by Support Vector Approach
Shih-Hung Wu
|
Wen-Feng Shih
In this paper, we report a short answer grading system in
Chinese. We build a system based on standard machine learning approaches and test it with translated corpus from two publicly available corpus in English. The experiment results show similar results on two different corpus as in
English.
pdf
bib
abs
From Fidelity to Fluency : Natural Language Processing for Translator Training
Oi Yee Kwong
This study explores the use of natural language processing techniques to enhance bilingual lexical access beyond simple equivalents, to enable translators to navigate along a wider cross-lingual lexical space and more examples showing different translation strategies, which is essential for them to learn to produce not only faithful but also fluent translations.
pdf
bib
abs
Countering Position Bias in Instructor Interventions in MOOC Discussion ForumsMOOC Discussion Forums
Muthu Kumar Chandrasekaran
|
Min-Yen Kan
We systematically confirm that instructors are strongly influenced by the user interface presentation of Massive Online Open Course (MOOC) discussion forums. In a large scale dataset, we conclusively show that instructor interventions exhibit strong position bias, as measured by the position where the thread appeared on the
user interface at the time of intervention. We measure and remove this
bias, enabling unbiased statistical modelling and
evaluation. We show that our de-biased classifier improves predicting interventions over the
state-of-the-art on
courses with sufficient number of
interventions by 8.2 % in F1 and 24.4 % in
recall on average.
pdf
bib
abs
Learning to Automatically Generate Fill-In-The-Blank Quizzes
Edison Marrese-Taylor
|
Ai Nakajima
|
Yutaka Matsuo
|
Ono Yuichi
In this paper we formalize the problem automatic fill-in-the-blank question generation using two standard NLP machine learning schemes, proposing concrete deep learning models for each. We present an empirical study based on data obtained from a
language learning platform showing that both of our proposed settings offer promising results.
pdf
bib
abs
Multilingual Short Text Responses Clustering for Mobile Educational Activities : a Preliminary Exploration
Yuen-Hsien Tseng
|
Lung-Hao Lee
|
Yu-Ta Chien
|
Chun-Yen Chang
|
Tsung-Yen Li
Text clustering is a powerful technique to detect topics from
document corpora, so as to provide information browsing,
analysis, and
organization. On the other hand, the Instant Response System (IRS) has been widely used in recent years to enhance student engagement in class and thus improve their learning effectiveness. However, the lack of functions to process short text responses from the
IRS prevents the further application of
IRS in classes. Therefore, this study aims to propose a proper short text clustering module for the
IRS, and demonstrate our implemented techniques through real-world examples, so as to provide experiences and insights for further study. In particular, we have compared three
clustering methods and the result shows that theoretically better methods need not lead to better results, as there are various factors that may affect the final performance.
pdf
bib
abs
Chinese Grammatical Error Diagnosis Based on CRF and LSTM-CRF modelChinese Grammatical Error Diagnosis Based on CRF and LSTM-CRF model
Yujie Zhou
|
Yinan Shao
|
Yong Zhou
When learning
Chinese as a foreign language, the learners may have some
grammatical errors due to negative migration of their native languages. However, few grammar checking applications have been developed to support the learners. The goal of this paper is to develop a tool to automatically diagnose four types of grammatical errors which are redundant words (R), missing words (M), bad word selection (S) and disordered words (W) in Chinese sentences written by those foreign learners. In this paper, a conventional linear CRF model with specific
feature engineering and a LSTM-CRF model are used to solve the CGED (Chinese Grammatical Error Diagnosis) task. We make some improvement on both models and the submitted results have better performance on
false positive rate and
accuracy than the average of all runs from CGED2018 for all three evaluation levels.
pdf
bib
abs
Detecting Simultaneously Chinese Grammar Errors Based on a BiLSTM-CRF ModelChinese Grammar Errors Based on a BiLSTM-CRF Model
Yajun Liu
|
Hongying Zan
|
Mengjie Zhong
|
Hongchao Ma
In the process of learning and using
Chinese, many learners of
Chinese as foreign language(CFL) may have
grammar errors due to negative migration of their native languages. This paper introduces our system that can simultaneously diagnose four types of grammatical errors including redundant (R), missing (M), selection (S), disorder (W) in NLPTEA-5 shared task. We proposed a Bidirectional LSTM CRF neural network (BiLSTM-CRF) that combines BiLSTM and CRF without hand-craft features for Chinese Grammatical Error Diagnosis (CGED). Evaluation includes three levels, which are detection level, identification level and position level. At the detection level and identification level, our
system got the third recall scores, and achieved good
F1 values.
pdf
bib
abs
A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error DetectionChinese Grammatical Error Detection
Yiyi Wang
|
Chilin Shih
This paper presents a method of combining Conditional Random Fields (CRFs) model with a post-processing layer using Google n-grams statistical information tailored to detect word selection and word order errors made by learners of Chinese as Foreign Language (CFL). We describe the architecture of the
model and its performance in the shared task of the ACL 2018 Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA). This hybrid approach yields comparably high
false positive rate (FPR = 0.1274) and
precision (Pd= 0.7519 ; Pi= 0.6311), but low
recall (Rd = 0.3035 ; Ri = 0.1696) in grammatical error detection and identification tasks. Additional
statistical information and
linguistic rules can be added to enhance the
model performance in the future.
pdf
bib
abs
CYUT-III Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2018 CGED Shared TaskCYUT-III Team Chinese Grammatical Error Diagnosis System Report in NLPTEA-2018 CGED Shared Task
Shih-Hung Wu
|
Jun-Wei Wang
|
Liang-Pu Chen
|
Ping-Che Yang
This paper reports how we build a Chinese Grammatical Error Diagnosis system in the NLPTEA-2018 CGED shared task. In 2018, we sent three runs with three different
approaches. The first one is a pattern-based approach by frequent error pattern matching. The second one is a sequential labelling approach by conditional random fields (CRF). The third one is a rewriting approach by sequence to sequence (seq2seq) model. The three approaches have different properties that aim to optimize different
performance metrics and the formal run results show the differences as we expected.