Alessandro Moschitti


2021

pdf bib
Joint Models for Answer Verification in Question Answering Systems
Zeyu Zhang | Thuy Vu | Alessandro Moschitti
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper studies joint models for selecting correct answer sentences among the top k provided by answer sentence selection (AS2) modules, which are core components of retrieval-based Question Answering (QA) systems. Our work shows that a critical step to effectively exploiting an answer set regards modeling the interrelated information between pair of answers. For this purpose, we build a three-way multi-classifier, which decides if an answer supports, refutes, or is neutral with respect to another one. More specifically, our neural architecture integrates a state-of-the-art AS2 module with the multi-classifier, and a joint layer connecting all components. We tested our models on WikiQA, TREC-QA, and a real-world dataset. The results show that our models obtain the new state of the art in AS2.

pdf bib
Modeling Context in Answer Sentence Selection Systems on a Latency Budget
Rujun Han | Luca Soldaini | Alessandro Moschitti
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Answer Sentence Selection (AS2) is an efficient approach for the design of open-domain Question Answering (QA) systems. In order to achieve low latency, traditional AS2 models score question-answer pairs individually, ignoring any information from the document each potential answer was extracted from. In contrast, more computationally expensive models designed for machine reading comprehension tasks typically receive one or more passages as input, which often results in better accuracy. In this work, we present an approach to efficiently incorporate contextual information in AS2 models. For each answer candidate, we first use unsupervised similarity techniques to extract relevant sentences from its source document, which we then feed into an efficient transformer architecture fine-tuned for AS2. Our best approach, which leverages a multi-way attention architecture to efficiently encode context, improves 6 % to 11 % over non-contextual state of the art in AS2 with minimal impact on system latency. All experiments in this work were conducted in English.

pdf bib
CDA : a Cost Efficient Content-based Multilingual Web Document AlignerCDA: a Cost Efficient Content-based Multilingual Web Document Aligner
Thuy Vu | Alessandro Moschitti
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We introduce a Content-based Document Alignment approach (CDA), an efficient method to align multilingual web documents based on content in creating parallel training data for machine translation (MT) systems operating at the industrial level. CDA works in two steps : (i) projecting documents of a web domain to a shared multilingual space ; then (ii) aligning them based on the similarity of their representations in such space. We leverage lexical translation models to build vector representations using TFIDF. CDA achieves performance comparable with state-of-the-art systems in the WMT-16 Bilingual Document Alignment Shared Task benchmark while operating in multilingual space. Besides, we created two web-scale datasets to examine the robustness of CDA in an industrial setting involving up to 28 languages and millions of documents. The experiments show that CDA is robust, cost-effective, and is significantly superior in (i) processing large and noisy web data and (ii) scaling to new and low-resourced languages.

pdf bib
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering
Siddhant Garg | Alessandro Moschitti
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper we propose a novel approach towards improving the efficiency of Question Answering (QA) systems by filtering out questions that will not be answered by them. This is based on an interesting new finding : the answer confidence scores of state-of-the-art QA systems can be approximated well by models solely using the input question text. This enables preemptive filtering of questions that are not answered by the system due to their answer confidence scores being lower than the system threshold. Specifically, we learn Transformer-based question models by distilling Transformer-based answering models. Our experiments on three popular QA datasets and one industrial QA benchmark demonstrate the ability of our question models to approximate the Precision / Recall curves of the target QA system well. These question models, when used as filters, can effectively trade off lower computation cost of QA systems for lower Recall, e.g., reducing computation by ~60 %, while only losing ~3-4 % of Recall.

2020

pdf bib
The Cascade Transformer : an Application for Efficient Answer Sentence Selection
Luca Soldaini | Alessandro Moschitti
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective technique to adapt transformer-based models into a cascade of rankers. Each ranker is used to prune a subset of candidates in a batch, thus dramatically increasing throughput at inference time. Partial encodings from the transformer model are shared among rerankers, providing further speed-up. When compared to a state-of-the-art transformer model, our approach reduces computation by 37 % with almost no impact on accuracy, as measured on two English Question Answering datasets.

pdf bib
Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP
Oren Sar Shalom | Alexander Panchenko | Cicero dos Santos | Varvara Logacheva | Alessandro Moschitti | Ido Dagan
Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP

2019

pdf bib
A Study of Latent Structured Prediction Approaches to Passage Reranking
Iryna Haponchyk | Alessandro Moschitti
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The structured output framework provides a helpful tool for learning to rank problems. In this paper, we propose a structured output approach which regards rankings as latent variables. Our approach addresses the complex optimization of Mean Average Precision (MAP) ranking metric. We provide an inference procedure to find the max-violating ranking based on the decomposition of the corresponding loss. The results of our experiments on WikiQA and TREC13 datasets show that our reranking based on structured prediction is a promising research direction.

2018

pdf bib
Learning to Progressively Recognize New Named Entities with Sequence to Sequence Models
Lingzhen Chen | Alessandro Moschitti
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we propose to use a sequence to sequence model for Named Entity Recognition (NER) and we explore the effectiveness of such model in a progressive NER setting a Transfer Learning (TL) setting. We train an initial model on source data and transfer it to a model that can recognize new NE categories in the target data during a subsequent step, when the source data is no longer available. Our solution consists in : (i) to reshape and re-parametrize the output layer of the first learned model to enable the recognition of new NEs ; (ii) to leave the rest of the architecture unchanged, such that it is initialized with parameters transferred from the initial model ; and (iii) to fine tune the network on the target data. Most importantly, we design a new NER approach based on sequence to sequence (Seq2Seq) models, which can intuitively work better in our progressive setting. We compare our approach with a Bidirectional LSTM, which is a strong neural NER model. Our experiments show that the Seq2Seq model performs very well on the standard NER setting and it is more robust in the progressive setting. Our approach can recognize previously unseen NE categories while preserving the knowledge of the seen data.

pdf bib
Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection
Massimo Nicosia | Alessandro Moschitti
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

State-of-the-art networks that model relations between two pieces of text often use complex architectures and attention. In this paper, instead of focusing on architecture engineering, we take advantage of small amounts of labelled data that model semantic phenomena in text to encode matching features directly in the word representations. This greatly boosts the accuracy of our reference network, while keeping the model simple and fast to train. Our approach also beats a tree kernel model that uses similar input encodings, and neural models which use advanced attention and compare-aggregate mechanisms.

pdf bib
Cross-Pair Text Representations for Answer Sentence Selection
Kateryna Tymoshenko | Alessandro Moschitti
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

High-level semantics tasks, e.g., paraphrasing, textual entailment or question answering, involve modeling of text pairs. Before the emergence of neural networks, this has been mostly performed using intra-pair features, which incorporate similarity scores or rewrite rules computed between the members within the same pair. In this paper, we compute scalar products between vectors representing similarity between members of different pairs, in place of simply using a single vector for each pair. This allows us to obtain a representation specific to any pair of pairs, which delivers the state of the art in answer sentence selection. Most importantly, our approach can outperform much more complex algorithms based on neural networks.

pdf bib
Supervised Clustering of Questions into Intents for Dialog System Applications
Iryna Haponchyk | Antonio Uva | Seunghak Yu | Olga Uryupina | Alessandro Moschitti
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Modern automated dialog systems require complex dialog managers able to deal with user intent triggered by high-level semantic questions. In this paper, we propose a model for automatically clustering questions into user intents to help the design tasks. Since questions are short texts, uncovering their semantics to group them together can be very challenging. We approach the problem by using powerful semantic classifiers from question duplicate / matching research along with a novel idea of supervised clustering methods based on structured output. We test our approach on two intent clustering corpora, showing an impressive improvement over previous methods for two languages / domains.

pdf bib
Automatic Stance Detection Using End-to-End Memory Networks
Mitra Mohtarami | Ramy Baly | James Glass | Preslav Nakov | Lluís Màrquez | Alessandro Moschitti
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present an effective end-to-end memory network model that jointly (i) predicts whether a given document can be considered as relevant evidence for a given claim, and (ii) extracts snippets of evidence that can be used to reason about the factuality of the target claim. Our model combines the advantages of convolutional and recurrent neural networks as part of a memory network. We further introduce a similarity matrix at the inference level of the memory network in order to extract snippets of evidence for input claims more accurately. Our experiments on a public benchmark dataset, FakeNewsChallenge, demonstrate the effectiveness of our approach.

pdf bib
Injecting Relational Structural Representation in Neural Networks for Question Similarity
Antonio Uva | Daniele Bonadiman | Alessandro Moschitti
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Effectively using full syntactic parsing information in Neural Networks (NNs) for solving relational tasks, e.g., question similarity, is still an open problem. In this paper, we propose to inject structural representations in NNs by (i) learning a model with Tree Kernels (TKs) on relatively few pairs of questions (few thousands) as gold standard (GS) training data is typically scarce, (ii) predicting labels on a very large corpus of question pairs, and (iii) pre-training NNs on such large corpus. The results on Quora and SemEval question similarity datasets show that NNs using our approach can learn more accurate models, especially after fine tuning on GS.

pdf bib
A Flexible, Efficient and Accurate Framework for Community Question Answering Pipelines
Salvatore Romeo | Giovanni Da San Martino | Alberto Barrón-Cedeño | Alessandro Moschitti
Proceedings of ACL 2018, System Demonstrations

Although deep neural networks have been proving to be excellent tools to deliver state-of-the-art results, when data is scarce and the tackled tasks involve complex semantic inference, deep linguistic processing and traditional structure-based approaches, such as tree kernel methods, are an alternative solution. Community Question Answering is a research area that benefits from deep linguistic analysis to improve the experience of the community of forum users. In this paper, we present a UIMA framework to distribute the computation of cQA tasks over computer clusters such that traditional systems can scale to large datasets and deliver fast processing.

2017

pdf bib
Collaborative Partitioning for Coreference Resolution
Olga Uryupina | Alessandro Moschitti
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper presents a collaborative partitioning algorithma novel ensemble-based approach to coreference resolution. Starting from the all-singleton partition, we search for a solution close to the ensemble’s outputs in terms of a task-specific similarity measure. Our approach assumes a loose integration of individual components of the ensemble and can therefore combine arbitrary coreference resolvers, regardless of their models. Our experiments on the CoNLL dataset show that collaborative partitioning yields results superior to those attained by the individual components, for ensembles of both strong and weak systems. Moreover, by applying the collaborative partitioning algorithm on top of three state-of-the-art resolvers, we obtain the best coreference performance reported so far in the literature (MELA v08 score of 64.47).

pdf bib
Learning Contextual Embeddings for Structural Semantic Similarity using Categorical Information
Massimo Nicosia | Alessandro Moschitti
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Tree kernels (TKs) and neural networks are two effective approaches for automatic feature engineering. In this paper, we combine them by modeling context word similarity in semantic TKs. This way, the latter can operate subtree matching by applying neural-based similarity on tree lexical nodes. We study how to learn representations for the words in context such that TKs can exploit more focused information. We found that neural embeddings produced by current methods do not provide a suitable contextual similarity. Thus, we define a new approach based on a Siamese Network, which produces word representations while learning a binary text similarity. We set the latter considering examples in the same category as similar. The experiments on question and sentiment classification show that our semantic TK highly improves previous results.

pdf bib
Self-Crowdsourcing Training for Relation Extraction
Azad Abad | Moin Nabi | Alessandro Moschitti
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper we introduce a self-training strategy for crowdsourcing. The training examples are automatically selected to train the crowd workers. Our experimental results show an impact of 5 % Improvement in terms of F1 for relation extraction task, compared to the method based on distant supervision.

pdf bib
SemEval-2017 Task 3 : Community Question AnsweringSemEval-2017 Task 3: Community Question Answering
Preslav Nakov | Doris Hoogeveen | Lluís Màrquez | Alessandro Moschitti | Hamdy Mubarak | Timothy Baldwin | Karin Verspoor
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe SemEval2017 Task 3 on Community Question Answering. This year, we reran the four subtasks from SemEval-2016 : (A) QuestionComment Similarity, (B) QuestionQuestion Similarity, (C) QuestionExternal Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 2015 and 2016 for training, and fresh data for testing. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums. A total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks AD. Unfortunately, no teams participated in subtask E. A variety of approaches and features were used by the participating systems to address the different subtasks. The best systems achieved an official score (MAP) of 88.43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. These scores are better than the baselines, especially for subtasks AC.

pdf bib
KeLP at SemEval-2017 Task 3 : Learning Pairwise Patterns in Community Question AnsweringKeLP at SemEval-2017 Task 3: Learning Pairwise Patterns in Community Question Answering
Simone Filice | Giovanni Da San Martino | Alessandro Moschitti
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the KeLP system participating in the SemEval-2017 community Question Answering (cQA) task. The system is a refinement of the kernel-based sentence pair modeling we proposed for the previous year challenge. It is implemented within the Kernel-based Learning Platform called KeLP, from which we inherit the team’s name. Our primary submission ranked first in subtask A, and third in subtasks B and C, being the only systems appearing in the top-3 ranking for all the English subtasks. This shows that the proposed framework, which has minor variations among the three subtasks, is extremely flexible and effective in tackling learning tasks defined on sentence pairs.

pdf bib
Ranking Kernels for Structures and Embeddings : A Hybrid Preference and Classification Model
Kateryna Tymoshenko | Daniele Bonadiman | Alessandro Moschitti
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Recent work has shown that Tree Kernels (TKs) and Convolutional Neural Networks (CNNs) obtain the state of the art in answer sentence reranking. Additionally, their combination used in Support Vector Machines (SVMs) is promising as it can exploit both the syntactic patterns captured by TKs and the embeddings learned by CNNs. However, the embeddings are constructed according to a classification function, which is not directly exploitable in the preference ranking algorithm of SVMs. In this work, we propose a new hybrid approach combining preference ranking applied to TKs and pointwise ranking applied to CNNs. We show that our approach produces better results on two well-known and rather different datasets : WikiQA for answer sentence selection and SemEval cQA for comment selection in Community Question Answering.

pdf bib
A Practical Perspective on Latent Structured Prediction for Coreference Resolution
Iryna Haponchyk | Alessandro Moschitti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Latent structured prediction theory proposes powerful methods such as Latent Structural SVM (LSSVM), which can potentially be very appealing for coreference resolution (CR). In contrast, only small work is available, mainly targeting the latent structured perceptron (LSP). In this paper, we carried out a practical study comparing for the first time online learning with LSSVM. We analyze the intricacies that may have made initial attempts to use LSSVM fail, i.e., a huge training time and much lower accuracy produced by Kruskal’s spanning tree algorithm. In this respect, we also propose a new effective feature selection approach for improving system efficiency. The results show that LSP, if correctly parameterized, produces the same performance as LSSVM, being much more efficient.

pdf bib
Effective shared representations with Multitask Learning for Community Question Answering
Daniele Bonadiman | Antonio Uva | Alessandro Moschitti
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

An important asset of using Deep Neural Networks (DNNs) for text applications is their ability to automatically engineering features. Unfortunately, DNNs usually require a lot of training data, especially for highly semantic tasks such as community Question Answering (cQA). In this paper, we tackle the problem of data scarcity by learning the target DNN together with two auxiliary tasks in a multitask learning setting. We exploit the strong semantic connection between selection of comments relevant to (i) new questions and (ii) forum questions. This enables a global representation for comments, new and previous questions. The experiments of our model on a SemEval challenge dataset for cQA show a 20 % of relative improvement over standard DNNs.