Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

Srinivas Bangalore, Jennifer Chu-Carroll, Yunyao Li (Editors)


Anthology ID:
N18-3
Month:
June
Year:
2018
Address:
New Orleans - Louisiana
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/N18-3
DOI:
10.18653/v1/N18-3
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/N18-3.pdf

pdf bib
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)
Srinivas Bangalore | Jennifer Chu-Carroll | Yunyao Li

pdf bib
Scalable Wide and Deep Learning for Computer Assisted Coding
Marilisa Amoia | Frank Diehl | Jesus Gimenez | Joel Pinto | Raphael Schumann | Fabian Stemmer | Paul Vozila | Yi Zhang

In recent years the use of electronic medical records has accelerated resulting in large volumes of medical data when a patient visits a healthcare facility. As a first step towards reimbursement healthcare institutions need to associate ICD-10 billing codes to these documents. This is done by trained clinical coders who may use a computer assisted solution for shortlisting of codes. In this work, we present our work to build a machine learning based scalable system for predicting ICD-10 codes from electronic medical records. We address data imbalance issues by implementing two system architectures using convolutional neural networks and logistic regression models. We illustrate the pros and cons of those system designs and show that the best performance can be achieved by leveraging the advantages of both using a system combination approach.

pdf bib
A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding
Young-Bum Kim | Dongchan Kim | Joo-Kyung Kim | Ruhi Sarikaya

Intelligent personal digital assistants (IPDAs), a popular real-life application with spoken language understanding capabilities, can cover potentially thousands of overlapping domains for natural language understanding, and the task of finding the best domain to handle an utterance becomes a challenging problem on a large scale. In this paper, we propose a set of efficient and scalable shortlisting-reranking neural models for effective large-scale domain classification for IPDAs. The shortlisting stage focuses on efficiently trimming all domains down to a list of k-best candidate domains, and the reranking stage performs a list-wise reranking of the initial k-best domains with additional contextual information. We show the effectiveness of our approach with extensive experiments on 1,500 IPDA domains.

pdf bib
Data Collection for Dialogue System : A Startup Perspective
Yiping Kang | Yunqi Zhang | Jonathan K. Kummerfeld | Lingjia Tang | Jason Mars

Industrial dialogue systems such as Apple Siri and Google Now rely on large scale diverse and robust training data to enable their sophisticated conversation capability. Crowdsourcing provides a scalable and inexpensive way of data collection but collecting high quality data efficiently requires thoughtful orchestration of the crowdsourcing jobs. Prior study of this topic have focused on tasks only in the academia settings with limited scope or only provide intrinsic dataset analysis, lacking indication on how it affects the trained model performance. In this paper, we present a study of crowdsourcing methods for a user intent classification task in our deployed dialogue system. Our task requires classification of 47 possible user intents and contains many intent pairs with subtle differences. We consider different crowdsourcing job types and job prompts and analyze quantitatively the quality of the collected data and the downstream model performance on a test set of real user queries from production logs. Our observation provides insights into designing efficient crowdsourcing jobs and provide recommendations for future dialogue system data collection process.

pdf bib
Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning
Pararth Shah | Dilek Hakkani-Tür | Bing Liu | Gokhan Tür

End-to-end neural models show great promise towards building conversational agents that are trained from data and on-line experience using supervised and reinforcement learning. However, these models require a large corpus of dialogues to learn effectively. For goal-oriented dialogues, such datasets are expensive to collect and annotate, since each task involves a separate schema and database of entities. Further, the Wizard-of-Oz approach commonly used for dialogue collection does not provide sufficient coverage of salient dialogue flows, which is critical for guaranteeing an acceptable task completion rate in consumer-facing conversational agents. In this paper, we study a recently proposed approach for building an agent for arbitrary tasks by combining dialogue self-play and crowd-sourcing to generate fully-annotated dialogues with diverse and natural utterances. We discuss the advantages of this approach for industry applications of conversational agents, wherein an agent can be rapidly bootstrapped to deploy in front of users and further optimized via interactive learning from actual users of the system.

pdf bib
Atypical Inputs in Educational Applications
Su-Youn Yoon | Aoife Cahill | Anastassia Loukina | Klaus Zechner | Brian Riordan | Nitin Madnani

In large-scale educational assessments, the use of automated scoring has recently become quite common. While the majority of student responses can be processed and scored without difficulty, there are a small number of responses that have atypical characteristics that make it difficult for an automated scoring system to assign a correct score. We describe a pipeline that detects and processes these kinds of responses at run-time. We present the most frequent kinds of what are called non-scorable responses along with effective filtering models based on various NLP and speech processing technologies. We give an overview of two operational automated scoring systems one for essay scoring and one for speech scoring and describe the filtering models they use. Finally, we present an evaluation and analysis of filtering models used for spoken responses in an assessment of language proficiency.

pdf bib
Accelerating NMT Batched Beam Decoding with LMBR Posteriors for DeploymentNMT Batched Beam Decoding with LMBR Posteriors for Deployment
Gonzalo Iglesias | William Tambellini | Adrià De Gispert | Eva Hasler | Bill Byrne

We describe a batched beam decoding algorithm for NMT with LMBR n-gram posteriors, showing that LMBR techniques still yield gains on top of the best recently reported results with Transformers. We also discuss acceleration strategies for deployment, and the effect of the beam size and batching on memory and speed.

pdf bib
From dictations to clinical reports using machine translation
Gregory Finley | Wael Salloum | Najmeh Sadoughi | Erik Edwards | Amanda Robinson | Nico Axtmann | Michael Brenndoerfer | Mark Miller | David Suendermann-Oeft

A typical workflow to document clinical encounters entails dictating a summary, running speech recognition, and post-processing the resulting text into a formatted letter. Post-processing entails a host of transformations including punctuation restoration, truecasing, marking sections and headers, converting dates and numerical expressions, parsing lists, etc. In conventional implementations, most of these tasks are accomplished by individual modules. We introduce a novel holistic approach to post-processing that relies on machine callytranslation. We show how this technique outperforms an alternative conventional systemeven learning to correct speech recognition errors during post-processingwhile being much simpler to maintain.

pdf bib
Benchmarks and models for entity-oriented polarity detection
Lidia Pivovarova | Arto Klami | Roman Yangarber

We address the problem of determining entity-oriented polarity in business news. This can be viewed as classifying the polarity of the sentiment expressed toward a given mention of a company in a news article. We present a complete, end-to-end approach to the problem. We introduce a new dataset of over 17,000 manually labeled documents, which is substantially larger than any currently available resources. We propose a benchmark solution based on convolutional neural networks for classifying entity-oriented polarity. Although our dataset is much larger than those currently available, it is small on the scale of datasets commonly used for training robust neural network models. To compensate for this, we use transfer learningpre-train the model on a much larger dataset, annotated for a related but different classification task, in order to learn a good representation for business text, and then fine-tune it on the smaller polarity dataset.

pdf bib
Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System
Judith Gaspers | Penny Karanasou | Rajen Chatterjee

This paper investigates the use of Machine Translation (MT) to bootstrap a Natural Language Understanding (NLU) system for a new language for the use case of a large-scale voice-controlled device. The goal is to decrease the cost and time needed to get an annotated corpus for the new language, while still having a large enough coverage of user requests. Different methods of filtering MT data in order to keep utterances that improve NLU performance and language-specific post-processing methods are investigated. These methods are tested in a large-scale NLU task with translating around 10 millions training utterances from English to German. The results show a large improvement for using MT data over a grammar-based and over an in-house data collection baseline, while reducing the manual effort greatly. Both filtering and post-processing approaches improve results further.

pdf bib
Fast and Scalable Expansion of Natural Language Understanding Functionality for Intelligent Agents
Anuj Kumar Goyal | Angeliki Metallinou | Spyros Matsoukas

Fast expansion of natural language functionality of intelligent virtual agents is critical for achieving engaging and informative interactions. However, developing accurate models for new natural language domains is a time and data intensive process. We propose efficient deep neural network architectures that maximally re-use available resources through transfer learning. Our methods are applied for expanding the understanding capabilities of a popular commercial agent and are evaluated on hundreds of new domains, designed by internal or external developers. We demonstrate that our proposed methods significantly increase accuracy in low resource settings and enable rapid development of accurate models with less data.

pdf bib
Bag of Experts Architectures for Model Reuse in Conversational Language Understanding
Rahul Jha | Alex Marin | Suvamsh Shivaprasad | Imed Zitouni

Slot tagging, the task of detecting entities in input user utterances, is a key component of natural language understanding systems for personal digital assistants. Since each new domain requires a different set of slots, the annotation costs for labeling data for training slot tagging models increases rapidly as the number of domains grow. To tackle this, we describe Bag of Experts (BoE) architectures for model reuse for both LSTM and CRF based models. Extensive experimentation over a dataset of 10 domains drawn from data relevant to our commercial personal digital assistant shows that our BoE models outperform the baseline models with a statistically significant average margin of 5.06 % in absolute F1-score when training with 2000 instances per domain, and achieve an even higher improvement of 12.16 % when only 25 % of the training data is used.

pdf bib
Multi-lingual neural title generation for e-Commerce browse pages
Prashant Mathur | Nicola Ueffing | Gregor Leusch

To provide better access of the inventory to buyers and better search engine optimization, e-Commerce websites are automatically generating millions of browse pages. A browse page consists of a set of slot name / value pairs within a given category, grouping multiple items which share some characteristics. These browse pages require a title describing the content of the page. Since the number of browse pages are huge, manual creation of these titles is infeasible. Previous statistical and neural approaches depend heavily on the availability of large amounts of data in a language. In this research, we apply sequence-to-sequence models to generate titles for high-resource as well as low-resource languages by leveraging transfer learning. We train these models on multi-lingual data, thereby creating one joint model which can generate titles in various different languages. Performance of the title generation system is evaluated on three different languages ; English, German, and French, with a particular focus on low-resourced French language.

pdf bib
A Novel Approach to Part Name Discovery in Noisy Text
Nobal Bikram Niraula | Daniel Whyatt | Anne Kao

As a specialized example of information extraction, part name extraction is an area that presents unique challenges. Part names are typically multi-word terms longer than two words. There is little consistency in how terms are described in noisy free text, with variations spawned by typos, ad hoc abbreviations, acronyms, and incomplete names. This makes search and analyses of parts in these data extremely challenging. In this paper, we present our algorithm, PANDA (Part Name Discovery Analytics), based on a unique method that exploits statistical, linguistic and machine learning techniques to discover part names in noisy text such as that in manufacturing quality documentation, supply chain management records, service communication logs, and maintenance reports. Experiments show that PANDA is scalable and outperforms existing techniques significantly.

pdf bib
Document-based Recommender System for Job Postings using Dense Representations
Ahmed Elsafty | Martin Riedl | Chris Biemann

Job boards and professional social networks heavily use recommender systems in order to better support users in exploring job advertisements. Detecting the similarity between job advertisements is important for job recommendation systems as it allows, for example, the application of item-to-item based recommendations. In this work, we research the usage of dense vector representations to enhance a large-scale job recommendation system and to rank German job advertisements regarding their similarity. We follow a two-folded evaluation scheme : (1) we exploit historic user interactions to automatically create a dataset of similar jobs that enables an offline evaluation. (2) In addition, we conduct an online A / B test and evaluate the best performing method on our platform reaching more than 1 million users. We achieve the best results by combining job titles with full-text job descriptions. In particular, this method builds dense document representation using words of the titles to weigh the importance of words of the full-text description. In the online evaluation, this approach allows us to increase the click-through rate on job recommendations for active users by 8.0 %.