Anil Kumar Singh

Also published as: Anil Kumar Singh, Anil kumar Singh


2020

pdf bib
NLPRL System for Very Low Resource Supervised Machine TranslationNLPRL System for Very Low Resource Supervised Machine Translation
Rupjyoti Baruah | Rajesh Kumar Mundotiya | Amit Kumar | Anil kumar Singh
Proceedings of the Fifth Conference on Machine Translation

This paper describes the results of the system that we used for the WMT20 very low resource (VLR) supervised MT shared task. For our experiments, we use a byte-level version of BPE, which requires a base vocabulary of size 256 only. BPE based models are a kind of sub-word models. Such models try to address the Out of Vocabulary (OOV) word problem by performing word segmentation so that segments correspond to morphological units. They are also reported to work across different languages, especially similar languages due to their sub-word nature. Based on BLEU cased score, our NLPRL systems ranked ninth for HSB to GER and tenth in GER to HSB translation scenario.

pdf bib
NLPRL at WNUT-2020 Task 2 : ELMo-based System for Identification of COVID-19 TweetsNLPRL at WNUT-2020 Task 2: ELMo-based System for Identification of COVID-19 Tweets
Rajesh Kumar Mundotiya | Rupjyoti Baruah | Bhavana Srivastava | Anil Kumar Singh
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

The Coronavirus pandemic has been a dominating news on social media for the last many months. Efforts are being made to reduce its spread and reduce the casualties as well as new infections. For this purpose, the information about the infected people and their related symptoms, as available on social media, such as Twitter, can help in prevention and taking precautions. This is an example of using noisy text processing for disaster management. This paper discusses the NLPRL results in Shared Task-2 of WNUT-2020 workshop. We have considered this problem as a binary classification problem and have used a pre-trained ELMo embedding with GRU units. This approach helps classify the tweets with accuracy as 80.85 % and 78.54 % as F1-score on the provided test dataset. The experimental code is available online.

pdf bib
Unsupervised Approach for Zero-Shot Experiments : BhojpuriHindi and MagahiHindi@LoResMT 2020Bhojpuri–Hindi and Magahi–Hindi@LoResMT 2020
Amit Kumar | Rajesh Kumar Mundotiya | Anil Kumar Singh
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

This paper reports a Machine Translation (MT) system submitted by the NLPRL team for the BhojpuriHindi and MagahiHindi language pairs at LoResMT 2020 shared task. We used an unsupervised domain adaptation approach that gives promising results for zero or extremely low resource languages. Task organizers provide the development and the test sets for evaluation and the monolingual data for training. Our approach is a hybrid approach of domain adaptation and back-translation. Metrics used to evaluate the trained model are BLEU, RIBES, Precision, Recall and F-measure. Our approach gives relatively promising results, with a wide range, of 19.5, 13.71, 2.54, and 3.16 BLEU points for Bhojpuri to Hindi, Magahi to Hindi, Hindi to Bhojpuri and Hindi to Magahi language pairs, respectively.

2019

pdf bib
NLPRL at WAT2019 : Transformer-based Tamil English Indic Task Neural Machine Translation SystemNLPRL at WAT2019: Transformer-based Tamil – English Indic Task Neural Machine Translation System
Amit Kumar | Anil Kumar Singh
Proceedings of the 6th Workshop on Asian Translation

This paper describes the Machine Translation system for Tamil-English Indic Task organized at WAT 2019. We use Transformer- based architecture for Neural Machine Translation.

2018

pdf bib
How emotional are you? Neural Architectures for Emotion Intensity Prediction in Microblogs
Devang Kulshreshtha | Pranav Goel | Anil Kumar Singh
Proceedings of the 27th International Conference on Computational Linguistics

Social media based micro-blogging sites like Twitter have become a common source of real-time information (impacting organizations and their strategies, and are used for expressing emotions and opinions. Automated analysis of such content therefore rises in importance. To this end, we explore the viability of using deep neural networks on the specific task of emotion intensity prediction in tweets. We propose a neural architecture combining convolutional and fully connected layers in a non-sequential manner-done for the first time in context of natural language based tasks. Combined with lexicon-based features along with transfer learning, our model achieves state-of-the-art performance, outperforming the previous system by 0.044 or 4.4 % Pearson correlation on the WASSA’17 EmoInt shared task dataset. We investigate the performance of deep multi-task learning models trained for all emotions at once in a unified architecture and get encouraging results. Experiments performed on evaluating correlation between emotion pairs offer interesting insights into the relationship between them.

pdf bib
Di-LSTM Contrast : A Deep Neural Network for Metaphor DetectionLSTM Contrast : A Deep Neural Network for Metaphor Detection
Krishnkant Swarnkar | Anil Kumar Singh
Proceedings of the Workshop on Figurative Language Processing

The contrast between the contextual and general meaning of a word serves as an important clue for detecting its metaphoricity. In this paper, we present a deep neural architecture for metaphor detection which exploits this contrast. Additionally, we also use cost-sensitive learning by re-weighting examples, and baseline features like concreteness ratings, POS and WordNet-based features. The best performing system of ours achieves an overall F1 score of 0.570 on All POS category and 0.605 on the Verbs category at the Metaphor Shared Task 2018.

pdf bib
IIT (BHU) Submission for the ACL Shared Task on Named Entity Recognition on Code-switched DataIIT (BHU) Submission for the ACL Shared Task on Named Entity Recognition on Code-switched Data
Shashwat Trivedi | Harsh Rangwani | Anil Kumar Singh
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

This paper describes the best performing system for the shared task on Named Entity Recognition (NER) on code-switched data for the language pair Spanish-English (ENG-SPA). We introduce a gated neural architecture for the NER task. Our final model achieves an F1 score of 63.76 %, outperforming the baseline by 10 %.

pdf bib
Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture
Soumil Mandal | Anil Kumar Singh
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

An accurate language identification tool is an absolute necessity for building complex NLP systems to be used on code-mixed data. Lot of work has been recently done on the same, but there’s still room for improvement. Inspired from the recent advancements in neural network architectures for computer vision tasks, we have implemented multichannel neural networks combining CNN and LSTM for word level language identification of code-mixed data. Combining this with a Bi-LSTM-CRF context capture module, accuracies of 93.28 % and 93.32 % is achieved on our two testing sets.

2017

pdf bib
IJCNLP-2017 Task 3 : Review Opinion Diversification (RevOpiD-2017)IJCNLP-2017 Task 3: Review Opinion Diversification (RevOpiD-2017)
Anil Kumar Singh | Avijit Thawani | Mayank Panchal | Anubhav Gupta | Julian McAuley
Proceedings of the IJCNLP 2017, Shared Tasks

Unlike Entity Disambiguation in web search results, Opinion Disambiguation is a relatively unexplored topic. RevOpiD shared task at IJCNLP-2107 aimed to attract attention towards this research problem. In this paper, we summarize the first run of this task and introduce a new dataset that we have annotated for the purpose of evaluating Opinion Mining, Summarization and Disambiguation methods.

pdf bib
IIT (BHU): System Description for LSDSem’17 Shared TaskIIT (BHU): System Description for LSDSem’17 Shared Task
Pranav Goel | Anil Kumar Singh
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

This paper describes an ensemble system submitted as part of the LSDSem Shared Task 2017-the Story Cloze Test. The main conclusion from our results is that an approach based on semantic similarity alone may not be enough for this task. We test various approaches and compare them with two ensemble systems. One is based on voting and the other on logistic regression based classifier. Our final system is able to outperform the previous state of the art for the Story Cloze test. Another very interesting observation is the performance of sentiment based approach which works almost as well on its own as our final ensemble system.