Manabu Okumura


pdf bib
Character-based Thai Word Segmentation with Multiple AttentionsThai Word Segmentation with Multiple Attentions
Thodsaporn Chay-intr | Hidetaka Kamigaito | Manabu Okumura
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Character-based word-segmentation models have been extensively applied to agglutinative languages, including Thai, due to their high performance. These models estimate word boundaries from a character sequence. However, a character unit in sequences has no essential meaning, compared with word, subword, and character cluster units. We propose a Thai word-segmentation model that uses various types of information, including words, subwords, and character clusters, from a character sequence. Our model applies multiple attentions to refine segmentation inferences by estimating the significant relationships among characters and various unit types. The experimental results indicate that our model can outperform other state-of-the-art Thai word-segmentation models.

pdf bib
Making Your Tweets More Fancy : Emoji Insertion to Texts
Jingun Kwon | Naoki Kobayashi | Hidetaka Kamigaito | Hiroya Takamura | Manabu Okumura
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In the social media, users frequently use small images called emojis in their posts. Although using emojis in texts plays a key role in recent communication systems, less attention has been paid on their positions in the given texts, despite that users carefully choose and put an emoji that matches their post. Exploring positions of emojis in texts will enhance understanding of the relationship between emojis and texts. We extend an emoji label prediction task taking into account the information of emoji positions, by jointly learning the emoji position in a tweet to predict the emoji label. The results demonstrate that the position of emojis in texts is a good clue to boost the performance of emoji label prediction. Human evaluation validates that there exists a suitable emoji position in a tweet, and our proposed task is able to make tweets more fancy and natural. In addition, considering emoji position can further improve the performance for the irony detection task compared to the emoji label prediction. We also report the experimental results for the modified dataset, due to the problem of the original dataset for the first shared task to predict an emoji label in SemEval2018.

pdf bib
Generic Mechanism for Reducing Repetitions in Encoder-Decoder Models
Ying Zhang | Hidetaka Kamigaito | Tatsuya Aoki | Hiroya Takamura | Manabu Okumura
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Encoder-decoder models have been commonly used for many tasks such as machine translation and response generation. As previous research reported, these models suffer from generating redundant repetition. In this research, we propose a new mechanism for encoder-decoder models that estimates the semantic difference of a source sentence before and after being fed into the encoder-decoder model to capture the consistency between two sides. This mechanism helps reduce repeatedly generated tokens for a variety of tasks. Evaluation results on publicly available machine translation and response generation datasets demonstrate the effectiveness of our proposal.

pdf bib
Generating Weather Comments from Meteorological Simulations
Soichiro Murakami | Sora Tanaka | Masatsugu Hangyo | Hidetaka Kamigaito | Kotaro Funakoshi | Hiroya Takamura | Manabu Okumura
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The task of generating weather-forecast comments from meteorological simulations has the following requirements : (i) the changes in numerical values for various physical quantities need to be considered, (ii) the weather comments should be dependent on delivery time and area information, and (iii) the comments should provide useful information for users. To meet these requirements, we propose a data-to-text model that incorporates three types of encoders for numerical forecast maps, observation data, and meta-data. We also introduce weather labels representing weather information, such as sunny and rain, for our model to explicitly describe useful information. We conducted automatic and human evaluations. The results indicate that our model performed best against baselines in terms of informativeness. We make our code and data publicly available.

pdf bib
One-class Text Classification with Multi-modal Deep Support Vector Data Description
Chenlong Hu | Yukun Feng | Hidetaka Kamigaito | Hiroya Takamura | Manabu Okumura
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This work presents multi-modal deep SVDD (mSVDD) for one-class text classification. By extending the uni-modal SVDD to a multiple modal one, we build mSVDD with multiple hyperspheres, that enable us to build a much better description for target one-class data. Additionally, the end-to-end architecture of mSVDD can jointly handle neural feature learning and one-class text learning. We also introduce a mechanism for incorporating negative supervision in the absence of real negative data, which can be beneficial to the mSVDD model. We conduct experiments on Reuters and 20 Newsgroup datasets, and the experimental results demonstrate that mSVDD outperforms uni-modal SVDD and mSVDD can get further improvements when negative supervision is incorporated.

pdf bib
A New Surprise Measure for Extracting Interesting Relationships between Persons
Hidetaka Kamigaito | Jingun Kwon | Young-In Song | Manabu Okumura
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

One way to enhance user engagement in search engines is to suggest interesting facts to the user. Although relationships between persons are important as a target for text mining, there are few effective approaches for extracting the interesting relationships between persons. We therefore propose a method for extracting interesting relationships between persons from natural language texts by focusing on their surprisingness. Our method first extracts all personal relationships from dependency trees for the texts and then calculates surprise scores for distributed representations of the extracted relationships in an unsupervised manner. The unique point of our method is that it does not require any labeled dataset with annotation for the surprising personal relationships. The results of the human evaluation show that the proposed method could extract more interesting relationships between persons from Japanese Wikipedia articles than a popularity-based baseline method. We demonstrate our proposed method as a chrome plugin on google search.

pdf bib
A Case Study of In-House Competition for Ranking Constructive Comments in a News Service
Hayato Kobayashi | Hiroaki Taguchi | Yoshimune Tabuchi | Chahine Koleejan | Ken Kobayashi | Soichiro Fujita | Kazuma Murao | Takeshi Masuyama | Taichi Yatsuka | Manabu Okumura | Satoshi Sekine
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

Ranking the user comments posted on a news article is important for online news services because comment visibility directly affects the user experience. Research on ranking comments with different metrics to measure the comment quality has shown constructiveness used in argument analysis is promising from a practical standpoint. In this paper, we report a case study in which this constructiveness is examined in the real world. Specifically, we examine an in-house competition to improve the performance of ranking constructive comments and demonstrate the effectiveness of the best obtained model for a commercial service.


pdf bib
Diverse and Non-redundant Answer Set Extraction on Community QA based on DPPsQA based on DPPs
Shogo Fujita | Tomohide Shibata | Manabu Okumura
Proceedings of the 28th International Conference on Computational Linguistics

In community-based question answering (CQA) platforms, it takes time for a user to get useful information from among many answers. Although one solution is an answer ranking method, the user still needs to read through the top-ranked answers carefully. This paper proposes a new task of selecting a diverse and non-redundant answer set rather than ranking the answers. Our method is based on determinantal point processes (DPPs), and it calculates the answer importance and similarity between answers by using BERT. We built a dataset focusing on a Japanese CQA site, and the experiments on this dataset demonstrated that the proposed method outperformed several baseline methods.


pdf bib
Split or Merge : Which is Better for Unsupervised RST Parsing?RST Parsing?
Naoki Kobayashi | Tsutomu Hirao | Kengo Nakamura | Hidetaka Kamigaito | Manabu Okumura | Masaaki Nagata
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Rhetorical Structure Theory (RST) parsing is crucial for many downstream NLP tasks that require a discourse structure for a text. Most of the previous RST parsers have been based on supervised learning approaches. That is, they require an annotated corpus of sufficient size and quality, and heavily rely on the language and domain dependent corpus. In this paper, we present two language-independent unsupervised RST parsing methods based on dynamic programming. The first one builds the optimal tree in terms of a dissimilarity score function that is defined for splitting a text span into smaller ones. The second builds the optimal tree in terms of a similarity score function that is defined for merging two adjacent spans into a large one. Experimental results on English and German RST treebanks showed that our parser based on span merging achieved the best score, around 0.8 F_1 score, which is close to the scores of the previous supervised parsers._1 score, which is close to the scores of the previous supervised parsers.

pdf bib
A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation
Yuta Hitomi | Yuya Taguchi | Hideaki Tamori | Ko Kikuta | Jiro Nishitoba | Naoaki Okazaki | Kentaro Inui | Manabu Okumura
Proceedings of the 12th International Conference on Natural Language Generation

Browsing news articles on multiple devices is now possible. The lengths of news article headlines have precise upper bounds, dictated by the size of the display of the relevant device or interface. Therefore, controlling the length of headlines is essential when applying the task of headline generation to news production. However, because there is no corpus of headlines of multiple lengths for a given article, previous research on controlling output length in headline generation has not discussed whether the system outputs could be adequately evaluated without multiple references of different lengths. In this paper, we introduce two corpora, which are Japanese News Corpus (JNC) and JApanese MUlti-Length Headline Corpus (JAMUL), to confirm the validity of previous evaluation settings. The JNC provides common supervision data for headline generation. The JAMUL is a large-scale evaluation dataset for headlines of three different lengths composed by professional editors. We report new findings on these corpora ; for example, although the longest length reference summary can appropriately evaluate the existing methods controlling output length, this evaluation setting has several problems.

pdf bib
Global Optimization under Length Constraint for Neural Text Summarization
Takuya Makino | Tomoya Iwakura | Hiroya Takamura | Manabu Okumura
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a global optimization method under length constraint (GOLC) for neural text summarization models. GOLC increases the probabilities of generating summaries that have high evaluation scores, ROUGE in this paper, within a desired length. We compared GOLC with two optimization methods, a maximum log-likelihood and a minimum risk training, on CNN / Daily Mail and a Japanese single document summarization data set of The Mainichi Shimbun Newspapers. The experimental results show that a state-of-the-art neural summarization model optimized with GOLC generates fewer overlength summaries while maintaining the fastest processing speed ; only 6.70 % overlength summaries on CNN / Daily and 7.8 % on long summary of Mainichi, compared to the approximately 20 % to 50 % on CNN / Daily Mail and 10 % to 30 % on Mainichi with the other optimization methods. We also demonstrate the importance of the generation of in-length summaries for post-editing with the dataset Mainich that is created with strict length constraints. The ex- perimental results show approximately 30 % to 40 % improved post-editing time by use of in-length summaries.

pdf bib
A Simple and Effective Method for Injecting Word-Level Information into Character-Aware Neural Language Models
Yukun Feng | Hidetaka Kamigaito | Hiroya Takamura | Manabu Okumura
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We propose a simple and effective method to inject word-level information into character-aware neural language models. Unlike previous approaches which usually inject word-level information at the input of a long short-term memory (LSTM) network, we inject it into the softmax function. The resultant model can be seen as a combination of character-aware language model and simple word-level language model. Our injection method can also be used together with previous methods. Through the experiments on 14 typologically diverse languages, we empirically show that our injection method, when used together with the previous methods, works better than the previous methods, including a gating mechanism, averaging, and concatenation of word vectors. We also provide a comprehensive comparison of these injection methods.


pdf bib
Neural Machine Translation Incorporating Named Entity
Arata Ugawa | Akihiro Tamura | Takashi Ninomiya | Hiroya Takamura | Manabu Okumura
Proceedings of the 27th International Conference on Computational Linguistics

This study proposes a new neural machine translation (NMT) model based on the encoder-decoder model that incorporates named entity (NE) tags of source-language sentences. Conventional NMT models have two problems enumerated as follows : (i) they tend to have difficulty in translating words with multiple meanings because of the high ambiguity, and (ii) these models’abilitytotranslatecompoundwordsseemschallengingbecausetheencoderreceivesaword, a part of the compound word, at each time step. To alleviate these problems, the encoder of the proposed model encodes the input word on the basis of its NE tag at each time step, which could reduce the ambiguity of the input word. Furthermore, the encoder introduces a chunk-level LSTM layer over a word-level LSTM layer and hierarchically encodes a source-language sentence to capture a compound NE as a chunk on the basis of the NE tags. We evaluate the proposed model on an English-to-Japanese translation task with the ASPEC, and English-to-Bulgarian and English-to-Romanian translation tasks with the Europarl corpus. The evaluation results show that the proposed model achieves up to 3.11 point improvement in BLEU.

pdf bib
Stylistically User-Specific Generation
Abdurrisyad Fikri | Hiroya Takamura | Manabu Okumura
Proceedings of the 11th International Conference on Natural Language Generation

Recent neural models for response generation show good results in terms of general responses. In real conversations, however, depending on the speaker / responder, similar utterances should require different responses. In this study, we attempt to consider individual user’s information in adjusting the notable sequence-to-sequence (seq2seq) model for more diverse, user-specific responses. We assume that we need user-specific features to adjust the response and we argue that some selected representative words from the users are suitable for this task. Furthermore, we prove that even for unseen or unknown users, our model can provide more diverse and interesting responses, while maintaining correlation with input utterances. Experimental results with human evaluation show that our model can generate more interesting responses than the popular seq2seqmodel and achieve higher relevance with input utterances than our baseline.


pdf bib
Supervised Attention for Sequence-to-Sequence Constituency Parsing
Hidetaka Kamigaito | Katsuhiko Hayashi | Tsutomu Hirao | Hiroya Takamura | Manabu Okumura | Masaaki Nagata
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

The sequence-to-sequence (Seq2Seq) model has been successfully applied to machine translation (MT). Recently, MT performances were improved by incorporating supervised attention into the model. In this paper, we introduce supervised attention to constituency parsing that can be regarded as another translation task. Evaluation results on the PTB corpus showed that the bracketing F-measure was improved by supervised attention.

pdf bib
Japanese Sentence Compression with a Large Training DatasetJapanese Sentence Compression with a Large Training Dataset
Shun Hasegawa | Yuta Kikuchi | Hiroya Takamura | Manabu Okumura
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In English, high-quality sentence compression models by deleting words have been trained on automatically created large training datasets. We work on Japanese sentence compression by a similar approach. To create a large Japanese training dataset, a method of creating English training dataset is modified based on the characteristics of the Japanese language. The created dataset is used to train Japanese sentence compression models based on the recurrent neural network.

pdf bib
Distinguishing Japanese Non-standard Usages from Standard OnesJapanese Non-standard Usages from Standard Ones
Tatsuya Aoki | Ryohei Sasano | Hiroya Takamura | Manabu Okumura
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We focus on non-standard usages of common words on social media. In the context of social media, words sometimes have other usages that are totally different from their original. In this study, we attempt to distinguish non-standard usages on social media from standard ones in an unsupervised manner. Our basic idea is that non-standardness can be measured by the inconsistency between the expected meaning of the target word and the given context. For this purpose, we use context embeddings derived from word embeddings. Our experimental results show that the model leveraging the context embedding outperforms other methods and provide us with findings, for example, on how to construct context embeddings and which corpus to use.