Masaaki Nagata


2020

pdf bib
A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERTBERT
Masaaki Nagata | Katsuki Chousa | Masaaki Nishino
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present a novel supervised word alignment method based on cross-language span prediction. We first formalize a word alignment problem as a collection of independent predictions from a token in the source sentence to a span in the target sentence. Since this step is equivalent to a SQuAD v2.0 style question answering task, we solve it using the multilingual BERT, which is fine-tuned on manually created gold word alignment data. It is nontrivial to obtain accurate alignment from a set of independently predicted spans. We greatly improved the word alignment accuracy by adding to the question the source token’s context and symmetrizing two directional predictions. In experiments using five word alignment datasets from among Chinese, Japanese, German, Romanian, French, and English, we show that our proposed method significantly outperformed previous supervised and unsupervised word alignment methods without any bitexts for pretraining. For example, we achieved 86.7 F1 score for the Chinese-English data, which is 13.3 points higher than the previous state-of-the-art supervised method.

pdf bib
Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical AbstractsMarkov CRFs for Biomedical Abstracts
Kosuke Yamada | Tsutomu Hirao | Ryohei Sasano | Koichi Takeda | Masaaki Nagata
Findings of the Association for Computational Linguistics: EMNLP 2020

Dividing biomedical abstracts into several segments with rhetorical roles is essential for supporting researchers’ information access in the biomedical domain. Conventional methods have regarded the task as a sequence labeling task based on sequential sentence classification, i.e., they assign a rhetorical label to each sentence by considering the context in the abstract. However, these methods have a critical problem : they are prone to mislabel longer continuous sentences with the same rhetorical label. To tackle the problem, we propose sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences. Accordingly, we introduce Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths. Experimental results obtained from PubMed 20k RCT and NICTA-PIBOSO datasets demonstrate that our proposed method achieved the best micro sentence-F1 score as well as the best micro span-F1 score.

pdf bib
Proceedings of the Fifth Conference on Machine Translation
Loïc Barrault | Ondřej Bojar | Fethi Bougares | Rajen Chatterjee | Marta R. Costa-jussà | Christian Federmann | Mark Fishel | Alexander Fraser | Yvette Graham | Paco Guzman | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Makoto Morishita | Christof Monz | Masaaki Nagata | Toshiaki Nakazawa | Matteo Negri
Proceedings of the Fifth Conference on Machine Translation

pdf bib
JParaCrawl : A Large Scale Web-Based English-Japanese Parallel CorpusJParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
Makoto Morishita | Jun Suzuki | Masaaki Nagata
Proceedings of the 12th Language Resources and Evaluation Conference

Recent machine translation algorithms mainly rely on parallel corpora. However, since the availability of parallel corpora remains limited, only some resource-rich language pairs can benefit from them. We constructed a parallel corpus for English-Japanese, for which the amount of publicly available parallel corpora is still limited. We constructed the parallel corpus by broadly crawling the web and automatically aligning parallel sentences. Our collected corpus, called JParaCrawl, amassed over 8.7 million sentence pairs. We show how it includes a broader range of domains and how a neural machine translation model trained with it works as a good pre-trained model for fine-tuning specific domains. The pre-training and fine-tuning approaches achieved or surpassed performance comparable to model training from the initial state and reduced the training time. Additionally, we trained the model with an in-domain dataset and JParaCrawl to show how we achieved the best performance with them. JParaCrawl and the pre-trained models are freely available online for research purposes.

pdf bib
A Test Set for Discourse Translation from Japanese to EnglishJapanese to English
Masaaki Nagata | Makoto Morishita
Proceedings of the 12th Language Resources and Evaluation Conference

We made a test set for Japanese-to-English discourse translation to evaluate the power of context-aware machine translation. For each discourse phenomenon, we systematically collected examples where the translation of the second sentence depends on the first sentence. Compared with a previous study on test sets for English-to-French discourse translation (CITATION), we needed different approaches to make the data because Japanese has zero pronouns and represents different senses in different characters. We improved the translation accuracy using context-aware neural machine translation, and the improvement mainly reflects the betterment of the translation of zero pronouns.

2019

pdf bib
Split or Merge : Which is Better for Unsupervised RST Parsing?RST Parsing?
Naoki Kobayashi | Tsutomu Hirao | Kengo Nakamura | Hidetaka Kamigaito | Manabu Okumura | Masaaki Nagata
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Rhetorical Structure Theory (RST) parsing is crucial for many downstream NLP tasks that require a discourse structure for a text. Most of the previous RST parsers have been based on supervised learning approaches. That is, they require an annotated corpus of sufficient size and quality, and heavily rely on the language and domain dependent corpus. In this paper, we present two language-independent unsupervised RST parsing methods based on dynamic programming. The first one builds the optimal tree in terms of a dissimilarity score function that is defined for splitting a text span into smaller ones. The second builds the optimal tree in terms of a similarity score function that is defined for merging two adjacent spans into a large one. Experimental results on English and German RST treebanks showed that our parser based on span merging achieved the best score, around 0.8 F_1 score, which is close to the scores of the previous supervised parsers._1 score, which is close to the scores of the previous supervised parsers.

pdf bib
Generating Natural Anagrams : Towards Language Generation Under Hard Combinatorial Constraints
Masaaki Nishino | Sho Takase | Tsutomu Hirao | Masaaki Nagata
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

An anagram is a sentence or a phrase that is made by permutating the characters of an input sentence or a phrase. For example, Trims cash is an anagram of Christmas. Existing automatic anagram generation methods can find possible combinations of words form an anagram. However, they do not pay much attention to the naturalness of the generated anagrams. In this paper, we show that simple depth-first search can yield natural anagrams when it is combined with modern neural language models. Human evaluation results show that the proposed method can generate significantly more natural anagrams than baseline methods.

pdf bib
Mixed Multi-Head Self-Attention for Neural Machine Translation
Hongyi Cui | Shohei Iida | Po-Hsuan Hung | Takehito Utsuro | Masaaki Nagata
Proceedings of the 3rd Workshop on Neural Generation and Translation

Recently, the Transformer becomes a state-of-the-art architecture in the filed of neural machine translation (NMT). A key point of its high-performance is the multi-head self-attention which is supposed to allow the model to independently attend to information from different representation subspaces. However, there is no explicit mechanism to ensure that different attention heads indeed capture different features, and in practice, redundancy has occurred in multiple heads. In this paper, we argue that using the same global attention in multiple heads limits multi-head self-attention’s capacity for learning distinct features. In order to improve the expressiveness of multi-head self-attention, we propose a novel Mixed Multi-Head Self-Attention (MMA) which models not only global and local attention but also forward and backward attention in different attention heads. This enables the model to learn distinct representations explicitly among multiple heads. In our experiments on both WAT17 English-Japanese as well as IWSLT14 German-English translation task, we show that, without increasing the number of parameters, our models yield consistent and significant improvements (0.9 BLEU scores on average) over the strong Transformer baseline.

pdf bib
NTT’s Machine Translation Systems for WMT19 Robustness TaskNTT’s Machine Translation Systems for WMT19 Robustness Task
Soichiro Murakami | Makoto Morishita | Tsutomu Hirao | Masaaki Nagata
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes NTT’s submission to the WMT19 robustness task. This task mainly focuses on translating noisy text (e.g., posts on Twitter), which presents different difficulties from typical translation tasks such as news. Our submission combined techniques including utilization of a synthetic corpus, domain adaptation, and a placeholder mechanism, which significantly improved over the previous baseline. Experimental results revealed the placeholder mechanism, which temporarily replaces the non-standard tokens including emojis and emoticons with special placeholder tokens during translation, improves translation accuracy even with noisy texts.

pdf bib
Using Semantic Similarity as Reward for Reinforcement Learning in Sentence Generation
Go Yasui | Yoshimasa Tsuruoka | Masaaki Nagata
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Traditional model training for sentence generation employs cross-entropy loss as the loss function. While cross-entropy loss has convenient properties for supervised learning, it is unable to evaluate sentences as a whole, and lacks flexibility. We present the approach of training the generation model using the estimated semantic similarity between the output and reference sentences to alleviate the problems faced by the training with cross-entropy loss. We use the BERT-based scorer fine-tuned to the Semantic Textual Similarity (STS) task for semantic similarity estimation, and train the model with the estimated scores through reinforcement learning (RL). Our experiments show that reinforcement learning with semantic similarity reward improves the BLEU scores from the baseline LSTM NMT model.

2018

pdf bib
Improving Neural Machine Translation by Incorporating Hierarchical Subword Features
Makoto Morishita | Jun Suzuki | Masaaki Nagata
Proceedings of the 27th International Conference on Computational Linguistics

This paper focuses on subword-based Neural Machine Translation (NMT). We hypothesize that in the NMT model, the appropriate subword units for the following three modules (layers) can differ : (1) the encoder embedding layer, (2) the decoder embedding layer, and (3) the decoder output layer. We find the subword based on Sennrich et al. (2016) has a feature that a large vocabulary is a superset of a small vocabulary and modify the NMT model enables the incorporation of several different subword units in a single embedding layer. We refer these small subword features as hierarchical subword features. To empirically investigate our assumption, we compare the performance of several different subword units and hierarchical subword features for both the encoder and decoder embedding layers. We confirmed that incorporating hierarchical subword features in the encoder consistently improves BLEU scores on the IWSLT evaluation datasets.

pdf bib
Automatic Pyramid Evaluation Exploiting EDU-based Extractive Reference SummariesEDU-based Extractive Reference Summaries
Tsutomu Hirao | Hidetaka Kamigaito | Masaaki Nagata
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper tackles automation of the pyramid method, a reliable manual evaluation framework. To construct a pyramid, we transform human-made reference summaries into extractive reference summaries that consist of Elementary Discourse Units (EDUs) obtained from source documents and then weight every EDU by counting the number of extractive reference summaries that contain the EDU. A summary is scored by the correspondences between EDUs in the summary and those in the pyramid. Experiments on DUC and TAC data sets show that our methods strongly correlate with various manual evaluations.

pdf bib
Neural Tensor Networks with Diagonal Slice Matrices
Takahiro Ishihara | Katsuhiko Hayashi | Hitoshi Manabe | Masashi Shimbo | Masaaki Nagata
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Although neural tensor networks (NTNs) have been successful in many NLP tasks, they require a large number of parameters to be estimated, which often leads to overfitting and a long training time. We address these issues by applying eigendecomposition to each slice matrix of a tensor to reduce its number of paramters. First, we evaluate our proposed NTN models on knowledge graph completion. Second, we extend the models to recursive NTNs (RNTNs) and evaluate them on logical reasoning tasks. These experiments show that our proposed models learn better and faster than the original (R)NTNs.

2017

pdf bib
Supervised Attention for Sequence-to-Sequence Constituency Parsing
Hidetaka Kamigaito | Katsuhiko Hayashi | Tsutomu Hirao | Hiroya Takamura | Manabu Okumura | Masaaki Nagata
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

The sequence-to-sequence (Seq2Seq) model has been successfully applied to machine translation (MT). Recently, MT performances were improved by incorporating supervised attention into the model. In this paper, we introduce supervised attention to constituency parsing that can be regarded as another translation task. Evaluation results on the PTB corpus showed that the bracketing F-measure was improved by supervised attention.

pdf bib
Input-to-Output Gate to Improve RNN Language ModelsRNN Language Models
Sho Takase | Jun Suzuki | Masaaki Nagata
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper proposes a reinforcing method that refines the output layers of existing Recurrent Neural Network (RNN) language models. We refer to our proposed method as Input-to-Output Gate (IOG). IOG has an extremely simple structure, and thus, can be easily combined with any RNN language models. Our experiments on the Penn Treebank and WikiText-2 datasets demonstrate that IOG consistently boosts the performance of several different types of current topline RNN language models.

pdf bib
Oracle Summaries of Compressive Summarization
Tsutomu Hirao | Masaaki Nishino | Masaaki Nagata
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper derives an Integer Linear Programming (ILP) formulation to obtain an oracle summary of the compressive summarization paradigm in terms of ROUGE. The oracle summary is essential to reveal the upper bound performance of the paradigm. Experimental results on the DUC dataset showed that ROUGE scores of compressive oracles are significantly higher than those of extractive oracles and state-of-the-art summarization systems. These results reveal that compressive summarization is a promising paradigm and encourage us to continue with the research to produce informative summaries.

pdf bib
Controlling Target Features in Neural Machine Translation via Prefix Constraints
Shunsuke Takeno | Masaaki Nagata | Kazuhide Yamamoto
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

We propose prefix constraints, a novel method to enforce constraints on target sentences in neural machine translation. It places a sequence of special tokens at the beginning of target sentence (target prefix), while side constraints places a special token at the end of source sentence (source suffix). Prefix constraints can be predicted from source sentence jointly with target sentence, while side constraints (Sennrich et al., 2016) must be provided by the user or predicted by some other methods. In both methods, special tokens are designed to encode arbitrary features on target-side or metatextual information. We show that prefix constraints are more flexible than side constraints and can be used to control the behavior of neural machine translation, in terms of output length, bidirectional decoding, domain adaptation, and unaligned target word generation.prefix constraints, a novel method to enforce constraints on\n target sentences in neural machine translation. It places a sequence of\n special tokens at the beginning of target sentence (target prefix), while\n side constraints places a special token at the end of source sentence\n (source suffix). Prefix constraints can be predicted from source sentence\n jointly with target sentence, while side constraints (Sennrich et al., 2016) must be provided by\n the user or predicted by some other methods. In both methods, special\n tokens are designed to encode arbitrary features on target-side or\n metatextual information. We show that prefix constraints are more flexible\n than side constraints and can be used to control the behavior of neural\n machine translation, in terms of output length, bidirectional decoding,\n domain adaptation, and unaligned target word generation.\n

pdf bib
Hierarchical Word Structure-based Parsing : A Feasibility Study on UD-style Dependency Parsing in JapaneseUD-style Dependency Parsing in Japanese
Takaaki Tanaka | Katsuhiko Hayashi | Masaaki Nagata
Proceedings of the 15th International Conference on Parsing Technologies

In applying word-based dependency parsing such as Universal Dependencies (UD) to Japanese, the uncertainty of word segmentation emerges for defining a word unit of the dependencies. We introduce the following hierarchical word structures to dependency parsing in Japanese : morphological units (a short unit word, SUW) and syntactic units (a long unit word, LUW). An SUW can be used to segment a sentence consistently, while it is too short to represent syntactic construction. An LUW is a unit including functional multiwords and LUW-based analysis facilitates the capturing of syntactic structure and makes parsing results more precise than SUW-based analysis. This paper describes the results of a feasibility study on the ability and the effectiveness of parsing methods based on hierarchical word structure (LUW chunking+parsing) in comparison to single layer word structure (SUW parsing). We also show joint analysis of LUW-chunking and dependency parsing improves the performance of identifying predicate-argument structures, while there is not much difference between overall results of them. not much difference between overall results of them.

pdf bib
Enumeration of Extractive Oracle Summaries
Tsutomu Hirao | Masaaki Nishino | Jun Suzuki | Masaaki Nagata
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

To analyze the limitations and the future directions of the extractive summarization paradigm, this paper proposes an Integer Linear Programming (ILP) formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also propose an algorithm that enumerates all of the oracle summaries for a set of reference summaries to exploit F-measures that evaluate which system summaries contain how many sentences that are extracted as an oracle summary. Our experimental results obtained from Document Understanding Conference (DUC) corpora demonstrated the following : (1) room still exists to improve the performance of extractive summarization ; (2) the F-measures derived from the enumerated oracle summaries have significantly stronger correlations with human judgment than those derived from single oracle summaries.

pdf bib
Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization
Jun Suzuki | Masaaki Nagata
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper tackles the reduction of redundant repeating generation that is often observed in RNN-based encoder-decoder models. Our basic idea is to jointly estimate the upper-bound frequency of each target vocabulary in the encoder and control the output words based on the estimation in the decoder. Our method shows significant improvement over a strong RNN-based encoder-decoder baseline and achieved its best results on an abstractive summarization benchmark.

pdf bib
K-best Iterative Viterbi ParsingViterbi Parsing
Katsuhiko Hayashi | Masaaki Nagata
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents an efficient and optimal parsing algorithm for probabilistic context-free grammars (PCFGs). To achieve faster parsing, our proposal employs a pruning technique to reduce unnecessary edges in the search space. The key is to conduct repetitively Viterbi inside and outside parsing, while gradually expanding the search space to efficiently compute heuristic bounds used for pruning. Our experimental results using the English Penn Treebank corpus show that the proposed algorithm is faster than the standard CKY parsing algorithm. In addition, we also show how to extend this algorithm to extract k-best Viterbi parse trees.