Proceedings of the Natural Legal Language Processing Workshop 2021

Nikolaos Aletras, Ion Androutsopoulos, Leslie Barrett, Catalina Goanta, Daniel Preotiuc-Pietro (Editors)

Anthology ID:
Punta Cana, Dominican Republic
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Natural Legal Language Processing Workshop 2021
Nikolaos Aletras | Ion Androutsopoulos | Leslie Barrett | Catalina Goanta | Daniel Preotiuc-Pietro

pdf bib
A Multilingual Approach to Identify and Classify Exceptional Measures against COVID-19COVID-19
Georgios Tziafas | Eugenie de Saint-Phalle | Wietse de Vries | Clara Egger | Tommaso Caselli

The COVID-19 pandemic has witnessed the implementations of exceptional measures by governments across the world to counteract its impact. This work presents the initial results of an on-going project, EXCEPTIUS, aiming to automatically identify, classify and com- pare exceptional measures against COVID-19 across 32 countries in Europe. To this goal, we created a corpus of legal documents with sentence-level annotations of eight different classes of exceptional measures that are im- plemented across these countries. We evalu- ated multiple multi-label classifiers on a manu- ally annotated corpus at sentence level. The XLM-RoBERTa model achieves highest per- formance on this multilingual multi-label clas- sification task, with a macro-average F1 score of 59.8 %.

pdf bib
JuriBERT : A Masked-Language Model Adaptation for French Legal TextJuriBERT: A Masked-Language Model Adaptation for French Legal Text
Stella Douka | Hadi Abdine | Michalis Vazirgiannis | Rajaa El Hamdani | David Restrepo Amariles

Language models have proven to be very useful when adapted to specific domains. Nonetheless, little research has been done on the adaptation of domain-specific BERT models in the French language. In this paper, we focus on creating a language model adapted to French legal text with the goal of helping law professionals. We conclude that some specific tasks do not benefit from generic language models pre-trained on large amounts of data. We explore the use of smaller architectures in domain-specific sub-languages and their benefits for French legal text. We prove that domain-specific pre-trained models can perform better than their equivalent generalised ones in the legal domain. Finally, we release JuriBERT, a new set of BERT models adapted to the French legal domain.

pdf bib
A Free Format Legal Question Answering System
Soha Khazaeli | Janardhana Punuru | Chad Morris | Sanjay Sharma | Bert Staub | Michael Cole | Sunny Chiu-Webster | Dhruv Sakalley

We present an information retrieval-based question answer system to answer legal questions. The system is not limited to a predefined set of questions or patterns and uses both sparse vector search and embeddings for input to a BERT-based answer re-ranking system. A combination of general domain and legal domain data is used for training. This natural question answering system is in production and is used commercially.

pdf bib
Legal Terminology Extraction with the Termolator
Nhi Pham | Lachlan Pham | Adam L. Meyers

Domain-specific terminology is ubiquitous in legal documents. Despite potential utility in populating glossaries and ontologies or as arguments in information extraction and document classification tasks, there has been limited work done for legal terminology extraction. This paper describes some work to remedy this omission. In the described research, we make some modifications to the Termolator, a high-performing, open-source terminology extractor which has been tuned to scientific articles. Our changes are designed to improve the Termolator’s results when applied to United States Supreme Court decisions. Unaltered and using the recommended settings, the original Termolator provides a list of terminology with a precision of 23 % and 25 % for the categories of economic activity (development set) and criminal procedures (test set) respectively. These were the most frequently occurring broad issues in Washington University in St. Louis Database corpus, a database of Supreme Court decisions that have been manually classified by topic. Our contribution includes the introduction of several legal domain-specific filtration steps and changes to the web search relevance score ; each incrementally improved precision culminating in a combined precision of 63 % and 65 %. We also evaluated the baseline version of the Termolator on more specific subcategories and on broad issues with fewer cases. Our results show that a narrowed scope as well as smaller document numbers significantly lower the precision. In both cases, the modifications to the Termolator improve precision.

pdf bib
Named Entity Recognition in Historic Legal Text : A Transformer and State Machine Ensemble Method
Fernando Trias | Hongming Wang | Sylvain Jaume | Stratos Idreos

Older legal texts are often scanned and digitized via Optical Character Recognition (OCR), which results in numerous errors. Although spelling and grammar checkers can correct much of the scanned text automatically, Named Entity Recognition (NER) is challenging, making correction of names difficult. To solve this, we developed an ensemble language model using a transformer neural network architecture combined with a finite state machine to extract names from English-language legal text. We use the US-based English language Harvard Caselaw Access Project for training and testing. Then, the extracted names are subjected to heuristic textual analysis to identify errors, make corrections, and quantify the extent of problems. With this system, we are able to extract most names, automatically correct numerous errors and identify potential mistakes that can later be reviewed for manual correction.

pdf bib
Semi-automatic Triage of Requests for Free Legal Assistance
Meladel Mistica | Jey Han Lau | Brayden Merrifield | Kate Fazio | Timothy Baldwin

Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet. A major bottleneck in the provision of free legal assistance to those most in need is the determination of the precise nature of the legal problem. This paper describes a collaboration with a major provider of free legal assistance, and the deployment of natural language processing models to assign area-of-law categories to real-world requests for legal assistance. In particular, we focus on an investigation of models to generate efficiencies in the triage process, but also the risks associated with naive use of model predictions, including fairness across different user demographics.