Proceedings of the Natural Legal Language Processing Workshop 2019

Nikolaos Aletras, Elliott Ash, Leslie Barrett, Daniel Chen, Adam Meyers, Daniel Preotiuc-Pietro, David Rosenberg, Amanda Stent (Editors)

Anthology ID:
Minneapolis, Minnesota
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Natural Legal Language Processing Workshop 2019
Nikolaos Aletras | Elliott Ash | Leslie Barrett | Daniel Chen | Adam Meyers | Daniel Preotiuc-Pietro | David Rosenberg | Amanda Stent

pdf bib
Scalable Methods for Annotating Legal-Decision Corpora
Lisa Ferro | John Aberdeen | Karl Branting | Craig Pfeifer | Alexander Yeh | Amartya Chakraborty

Recent research has demonstrated that judicial and administrative decisions can be predicted by machine-learning models trained on prior decisions. However, to have any practical application, these predictions must be explainable, which in turn requires modeling a rich set of features. Such approaches face a roadblock if the knowledge engineering required to create these features is not scalable. We present an approach to developing a feature-rich corpus of administrative rulings about domain name disputes, an approach which leverages a small amount of manual annotation and prototypical patterns present in the case documents to automatically extend feature labels to the entire corpus. To demonstrate the feasibility of this approach, we report results from systems trained on this dataset.

pdf bib
The Extent of Repetition in Contract Language
Dan Simonson | Daniel Broderick | Jonathan Herr

Contract language is repetitive (Anderson and Manns, 2017), but so is all language (Zipf, 1949). In this paper, we measure the extent to which contract language in English is repetitive compared with the language of other English language corpora. Contracts have much smaller vocabulary sizes compared with similarly sized non-contract corpora across multiple contract types, contain 1/5th as many hapax legomena, pattern differently on a log-log plot, use fewer pronouns, and contain sentences that are about 20 % more similar to one another than in other corpora. These suggest that the study of contracts in natural language processing controls for some linguistic phenomena and allows for more in depth study of others.

pdf bib
Sentence Boundary Detection in Legal Text
George Sanchez

In this paper, we examined several algorithms to detect sentence boundaries in legal text. Legal text presents challenges for sentence tokenizers because of the variety of punctuations and syntax of legal text. Out-of-the-box algorithms perform poorly on legal text affecting further analysis of the text. A novel and domain-specific approach is needed to detect sentence boundaries to further analyze legal text. We present the results of our investigation in this paper.

pdf bib
Litigation Analytics : Case Outcomes Extracted from US Federal Court DocketsUS Federal Court Dockets
Thomas Vacek | Ronald Teo | Dezhao Song | Timothy Nugent | Conner Cowling | Frank Schilder

Dockets contain a wealth of information for planning a litigation strategy, but the information is locked up in semi-structured text. Manually deriving the outcomes for each party (e.g., settlement, verdict) would be very labor intensive. Having such information available for every past court case, however, would be very useful for developing a strategy because it potentially reveals tendencies and trends of judges and courts and the opposing counsel. We used Natural Language Processing (NLP) techniques and deep learning methods allowing us to scale the automatic analysis of millions of US federal court dockets. The automatically extracted information is fed into a Litigation Analytics tool that is used by lawyers to plan how they approach concrete litigations.

pdf bib
Legal Area Classification : A Comparative Study of Text Classifiers on Singapore Supreme Court JudgmentsSingapore Supreme Court Judgments
Jerrold Soh | How Khang Lim | Ian Ernst Chai

This paper conducts a comparative study on the performance of various machine learning approaches for classifying judgments into legal areas. Using a novel dataset of 6,227 Singapore Supreme Court judgments, we investigate how state-of-the-art NLP methods compare against traditional statistical models when applied to a legal corpus that comprised few but lengthy documents. All approaches tested, including topic model, word embedding, and language model-based classifiers, performed well with as little as a few hundred judgments. However, more work needs to be done to optimize state-of-the-art methods for the legal domain.