Anupam Jamatia


pdf bib
NIT_Agartala_NLP_Team at SemEval-2019 Task 6 : An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media CorporaNIT_Agartala_NLP_Team at SemEval-2019 Task 6: An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media Corpora
Steve Durairaj Swamy | Anupam Jamatia | Björn Gambäck | Amitava Das
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper describes the systems submitted to OffensEval (SemEval 2019, Task 6) on ‘Identifying and Categorizing Offensive Language in Social Media’ by the ‘NIT_Agartala_NLP_Team’. A Twitter annotated dataset of 13,240 English tweets was provided by the task organizers to train the individual models, with the best results obtained using an ensemble model composed of six different classifiers. The ensemble model produced macro-averaged F1-scores of 0.7434, 0.7078 and 0.4853 on Subtasks A, B, and C, respectively. The paper highlights the overall low predictive nature of various linguistic features and surface level count features, as well as the limitations of a traditional machine learning approach when compared to a Deep Learning counterpart.

pdf bib
Validation of Facts Against Textual Sources
Vamsi Krishna Pendyala | Simran Sinha | Satya Prakash | Shriya Reddy | Anupam Jamatia
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In today’s digital world of information, a fact verification system to disprove assertions made in speech, print media or online content is the need of the hour. We propose a system which would verify a claim against a source and classify the claim to be true, false, out-of-context or an inappropriate claim with respect to the textual source provided to the system. A true label is used if the claim is true, false if it is false, if the claim has no relation with the source then it is classified as out-of-context and if the claim can not be verified at all then it is classified as inappropriate. This would help us to verify a claim or a fact as well as know about the source or our knowledge base against which we are trying to verify our facts. We used a two-step approach to achieve our goal. At first, we retrieved evidence related to the claims from the textual source using the Term Frequency-Inverse Document Frequency(TF-IDF) vectors. Later we classified the claim-evidence pairs as true, false, inappropriate and out of context using a modified version of textual entailment module. Textual entailment module calculates the probability of each sentence supporting the claim, contradicting the claim or not providing any relevant information using Bi-LSTM network to assess the veracity of the claim. The accuracy of the best performing system is 64.49 %

pdf bib
Studying Generalisability across Abusive Language Detection Datasets
Steve Durairaj Swamy | Anupam Jamatia | Björn Gambäck
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.