Jun Suzuki


pdf bib
Embeddings of Label Components for Sequence Labeling : A Case Study of Fine-grained Named Entity Recognition
Takuma Kato | Kaori Abe | Hiroki Ouchi | Shumpei Miyawaki | Jun Suzuki | Kentaro Inui
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

In general, the labels used in sequence labeling consist of different types of elements. For example, IOB-format entity labels, such as B-Person and I-Person, can be decomposed into span (B and I) and type information (Person). However, while most sequence labeling models do not consider such label components, the shared components across labels, such as Person, can be beneficial for label prediction. In this work, we propose to integrate label component information as embeddings into models. Through experiments on English and Japanese fine-grained named entity recognition, we demonstrate that the proposed method improves performance, especially for instances with low-frequency labels.

pdf bib
Tohoku-AIP-NTT at WMT 2020 News Translation TaskAIP-NTT at WMT 2020 News Translation Task
Shun Kiyono | Takumi Ito | Ryuto Konno | Makoto Morishita | Jun Suzuki
Proceedings of the Fifth Conference on Machine Translation

In this paper, we describe the submission of Tohoku-AIP-NTT to the WMT’20 news translation task. We participated in this task in two language pairs and four language directions : English German and English Japanese. Our system consists of techniques such as back-translation and fine-tuning, which are already widely adopted in translation tasks. We attempted to develop new methods for both synthetic data filtering and reranking. However, the methods turned out to be ineffective, and they provided us with no significant improvement over the baseline. We analyze these negative results to provide insights for future studies.

pdf bib
JParaCrawl : A Large Scale Web-Based English-Japanese Parallel CorpusJParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
Makoto Morishita | Jun Suzuki | Masaaki Nagata
Proceedings of the 12th Language Resources and Evaluation Conference

Recent machine translation algorithms mainly rely on parallel corpora. However, since the availability of parallel corpora remains limited, only some resource-rich language pairs can benefit from them. We constructed a parallel corpus for English-Japanese, for which the amount of publicly available parallel corpora is still limited. We constructed the parallel corpus by broadly crawling the web and automatically aligning parallel sentences. Our collected corpus, called JParaCrawl, amassed over 8.7 million sentence pairs. We show how it includes a broader range of domains and how a neural machine translation model trained with it works as a good pre-trained model for fine-tuning specific domains. The pre-training and fine-tuning approaches achieved or surpassed performance comparable to model training from the initial state and reduced the training time. Additionally, we trained the model with an in-domain dataset and JParaCrawl to show how we achieved the best performance with them. JParaCrawl and the pre-trained models are freely available online for research purposes.

pdf bib
Efficient Estimation of Influence of a Training Instance
Sosuke Kobayashi | Sho Yokoi | Jun Suzuki | Kentaro Inui
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

Understanding the influence of a training instance on a neural network model leads to improving interpretability. However, it is difficult and inefficient to evaluate the influence, which shows how a model’s prediction would be changed if a training instance were not used. In this paper, we propose an efficient method for estimating the influence. Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance. By switching between dropout masks, we can use sub-networks that learned or did not learn each training instance and estimate its influence. Through experiments with BERT and VGGNet on classification datasets, we demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization.


pdf bib
Transductive Learning of Neural Language Models for Syntactic and Semantic Analysis
Hiroki Ouchi | Jun Suzuki | Kentaro Inui
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In transductive learning, an unlabeled test set is used for model training. Although this setting deviates from the common assumption of a completely unseen test set, it is applicable in many real-world scenarios, wherein the texts to be processed are known in advance. However, despite its practical advantages, transductive learning is underexplored in natural language processing. Here we conduct an empirical study of transductive learning for neural models and demonstrate its utility in syntactic and semantic tasks. Specifically, we fine-tune language models (LMs) on an unlabeled test set to obtain test-set-specific word representations. Through extensive experiments, we demonstrate that despite its simplicity, transductive LM fine-tuning consistently improves state-of-the-art neural models in in-domain and out-of-domain settings.

pdf bib
TEASPN : Framework and Protocol for Integrated Writing Assistance EnvironmentsTEASPN: Framework and Protocol for Integrated Writing Assistance Environments
Masato Hagiwara | Takumi Ito | Tatsuki Kuribayashi | Jun Suzuki | Kentaro Inui
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

Language technologies play a key role in assisting people with their writing. Although there has been steady progress in e.g., grammatical error correction (GEC), human writers are yet to benefit from this progress due to the high development cost of integrating with writing software. We propose TEASPN, a protocol and an open-source framework for achieving integrated writing assistance environments. The protocol standardizes the way writing software communicates with servers that implement such technologies, allowing developers and researchers to integrate the latest developments in natural language processing (NLP) with low cost. As a result, users can enjoy the integrated experience in their favorite writing software. The results from experiments with human participants show that users use a wide range of technologies and rate their writing experience favorably, allowing them to write more fluent text.

pdf bib
The Sally Smedley Hyperpartisan News Detector at SemEval-2019 Task 4SemEval-2019 Task 4
Kazuaki Hanawa | Shota Sasaki | Hiroki Ouchi | Jun Suzuki | Kentaro Inui
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system submitted to the formal run of SemEval-2019 Task 4 : Hyperpartisan news detection. Our system is based on a linear classifier using several features, i.e., 1) embedding features based on the pre-trained BERT embeddings, 2) article length features, and 3) embedding features of informative phrases extracted from by-publisher dataset. Our system achieved 80.9 % accuracy on the test set for the formal run and got the 3rd place out of 42 teams.

pdf bib
Annotating with Pros and Cons of Technologies in Computer Science Papers
Hono Shirai | Naoya Inoue | Jun Suzuki | Kentaro Inui
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

This paper explores a task for extracting a technological expression and its pros / cons from computer science papers. We report ongoing efforts on an annotated corpus of pros / cons and an analysis of the nature of the automatic extraction task. Specifically, we show how to adapt the targeted sentiment analysis task for pros / cons extraction in computer science papers and conduct an annotation study. In order to identify the challenges of the automatic extraction task, we construct a strong baseline model and conduct an error analysis. The experiments show that pros / cons can be consistently annotated by several annotators, and that the task is challenging due to domain-specific knowledge. The annotated dataset is made publicly available for research purposes.

pdf bib
ESPnet How2 Speech Translation System for IWSLT 2019 : Pre-training, Knowledge Distillation, and Going DeeperESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper
Hirofumi Inaguma | Shun Kiyono | Nelson Enrique Yalta Soplin | Jun Suzuki | Kevin Duh | Shinji Watanabe
Proceedings of the 16th International Conference on Spoken Language Translation

This paper describes the ESPnet submissions to the How2 Speech Translation task at IWSLT2019. In this year, we mainly build our systems based on Transformer architectures in all tasks and focus on the end-to-end speech translation (E2E-ST). We first compare RNN-based models and Transformer, and then confirm Transformer models significantly and consistently outperform RNN models in all tasks and corpora. Next, we investigate pre-training of E2E-ST models with the ASR and MT tasks. On top of the pre-training, we further explore knowledge distillation from the NMT model and the deeper speech encoder, and confirm drastic improvements over the baseline model. All of our codes are publicly available in ESPnet.

pdf bib
Effective Adversarial Regularization for Neural Machine Translation
Motoki Sato | Jun Suzuki | Shun Kiyono
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

A regularization technique based on adversarial perturbation, which was initially developed in the field of image processing, has been successfully applied to text classification tasks and has yielded attractive improvements. We aim to further leverage this promising methodology into more sophisticated and critical neural models in the natural language processing field, i.e., neural machine translation (NMT) models. However, it is not trivial to apply this methodology to such models. Thus, this paper investigates the effectiveness of several possible configurations of applying the adversarial perturbation and reveals that the adversarial regularization technique can significantly and consistently improve the performance of widely used NMT models, such as LSTM-based and Transformer-based models.


pdf bib
Improving Neural Machine Translation by Incorporating Hierarchical Subword Features
Makoto Morishita | Jun Suzuki | Masaaki Nagata
Proceedings of the 27th International Conference on Computational Linguistics

This paper focuses on subword-based Neural Machine Translation (NMT). We hypothesize that in the NMT model, the appropriate subword units for the following three modules (layers) can differ : (1) the encoder embedding layer, (2) the decoder embedding layer, and (3) the decoder output layer. We find the subword based on Sennrich et al. (2016) has a feature that a large vocabulary is a superset of a small vocabulary and modify the NMT model enables the incorporation of several different subword units in a single embedding layer. We refer these small subword features as hierarchical subword features. To empirically investigate our assumption, we compare the performance of several different subword units and hierarchical subword features for both the encoder and decoder embedding layers. We confirmed that incorporating hierarchical subword features in the encoder consistently improves BLEU scores on the IWSLT evaluation datasets.


pdf bib
Input-to-Output Gate to Improve RNN Language ModelsRNN Language Models
Sho Takase | Jun Suzuki | Masaaki Nagata
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper proposes a reinforcing method that refines the output layers of existing Recurrent Neural Network (RNN) language models. We refer to our proposed method as Input-to-Output Gate (IOG). IOG has an extremely simple structure, and thus, can be easily combined with any RNN language models. Our experiments on the Penn Treebank and WikiText-2 datasets demonstrate that IOG consistently boosts the performance of several different types of current topline RNN language models.

pdf bib
Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels
Itsumi Saito | Jun Suzuki | Kyosuke Nishida | Kugatsu Sadamitsu | Satoshi Kobashikawa | Ryo Masumura | Yuji Matsumoto | Junji Tomita
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models. Attention based encoder-decoder models are greatly effective in generating many natural languages. % such as machine translation or machine summarization. In general, we have to prepare for a large amount of training data to train an encoder-decoder model. Unlike machine translation, there are few training data for text-normalization tasks. In this paper, we propose two methods for generating augmented data. The experimental results with Japanese dialect normalization indicate that our methods are effective for an encoder-decoder model and achieve higher BLEU score than that of baselines. We also investigated the oracle performance and revealed that there is sufficient room for improving an encoder-decoder model.

pdf bib
Enumeration of Extractive Oracle Summaries
Tsutomu Hirao | Masaaki Nishino | Jun Suzuki | Masaaki Nagata
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

To analyze the limitations and the future directions of the extractive summarization paradigm, this paper proposes an Integer Linear Programming (ILP) formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also propose an algorithm that enumerates all of the oracle summaries for a set of reference summaries to exploit F-measures that evaluate which system summaries contain how many sentences that are extracted as an oracle summary. Our experimental results obtained from Document Understanding Conference (DUC) corpora demonstrated the following : (1) room still exists to improve the performance of extractive summarization ; (2) the F-measures derived from the enumerated oracle summaries have significantly stronger correlations with human judgment than those derived from single oracle summaries.

pdf bib
Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization
Jun Suzuki | Masaaki Nagata
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper tackles the reduction of redundant repeating generation that is often observed in RNN-based encoder-decoder models. Our basic idea is to jointly estimate the upper-bound frequency of each target vocabulary in the encoder and control the output words based on the estimation in the decoder. Our method shows significant improvement over a strong RNN-based encoder-decoder baseline and achieved its best results on an abstractive summarization benchmark.