☆ 4.7 Article

A comparative study of automated legal text classification using random forests and deep learning

INFORMATION PROCESSING & MANAGEMENT (2022)

Journal

INFORMATION PROCESSING & MANAGEMENT

Volume 59, Issue 2, Pages -

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.ipm.2021.102798

Keywords

Legal text classification; Machine learning; Deep learning; Domain concept; Word embedding; Random forests

Funding

United States NSF [1852249]
NSA [H98230-20-1-0417]
State Key Laboratory for Novel Software Technology in Nanjing University, China [KFKT2019A19]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Automated legal text classification is a prominent research topic in the legal field. This paper investigates legal text classification with a large collection of labeled U.S. case documents through comparing the effectiveness of different text classification techniques. The study provides insights on selecting machine learning techniques for building high-performance text classification systems in the legal domain or other fields.

Automated legal text classification is a prominent research topic in the legal field. It lays the foundation for building an intelligent legal system. Current literature focuses on international legal texts, such as Chinese cases, European cases, and Australian cases. Little attention is paid to text classification for U.S. legal texts. Deep learning has been applied to improving text classification performance. Its effectiveness needs further exploration in domains such as the legal field. This paper investigates legal text classification with a large collection of labeled U.S. case documents through comparing the effectiveness of different text classification techniques. We propose a machine learning algorithm using domain concepts as features and random forests as the classifier. Our experiment results on 30,000 full U.S. case documents in 50 categories demonstrated that our approach significantly outperforms a deep learning system built on multiple pre-trained word embeddings and deep neural networks. In addition, applying only the top 400 domain concepts as features for building the random forests could achieve the best performance. This study provides a reference to select machine learning techniques for building high-performance text classification systems in the legal domain or other fields.

A comparative study of automated legal text classification using random forests and deep learning

Journal

INFORMATION PROCESSING & MANAGEMENT

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A comparative study of automated legal text classification using random forests and deep learning

Journal

INFORMATION PROCESSING & MANAGEMENT

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper