Skip to content

Chaganti-Reddy/CJPR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

88 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Court Judgement Prediction & Recommendation

CJPR built on different Natural Language Processing Models using ILDC dataset from Supreme Court to make Court Judgement Prediction & Providing Recommendations.

πŸ˜‡ Motivation

The motivation for building this system is to provide AI-powered data-driven prediction assistance to judicial practitioners to make a better decision. To meet the demand of solving the humongous load of pending cases, we have resorted to the modern-day techniques of using ML and AI to improve the efficiency of the process. This CJPR system brings a wave of revolution in the legal system where with the help of this model we can provide legal practitioners better insight into the case by giving them relevant historical cases and provide assistance to them for providing a better result.

Table of Contents

⭐ Features

  • Prediction of Court Petitions: CJPR is able to predict the court petitions based on the given case description.
  • Recommendation on Acceptance CJPR is able to recommend (If Petition is Accepted) similar historical cases based on the given case description.
  • Easy to Access: This system is deployed on docker and pushed to docker hub for easy access. Anyone can access this system by just pulling the docker image from docker hub & running it on their local machine.

⚠️ Frameworks and Libraries

  • Hugging Face: Hugging Face is an NLP-focused startup with a large open-source community, in particular around the Transformers library.
  • Sci-kit Learn: Simple and efficient tools for predictive data analysis.
  • Tensorflow / Keras: Deep learning framework used to build and train our models.
  • Pytorch: Deep learning framework used to build and train our models.
  • Numpy: NumPy is a Python library used for working with arrays.
  • Pandas: Pandas is a Python library used for working with data sets.
  • Matplotlib: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
  • Beautiful Soup: Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  • Docker: Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.

πŸ“ Datasets

The Dataset used for this project is ILDC Large dataset. The dataset contains 54,000+ court cases from the Supreme Court of India. This data is scraped from India Kanoon website. ILDC Large only contains data from Supreme court of India. The dataset contains the following columns:

  • ID: Unique ID for each case
  • Text: Petiton text of the case
  • Decision: Decision of the case (1: Accepted, 0: Rejected)
  • Label: Label of the case (1: Criminal, 0: Civil)
  • Year: Year of the case

Dataset is distributed as follows:

Dataset No.of Cases Percentage Purpose
Train 34,655 64% Training the model
Validation 10,830 20% Validating the model
Test 8,664 16% Testing the model

Data Preprocessing

Data preprocessing is an important step that improves the quality and consistency of the raw legal text data that has been collected from indiakanoon.org. The first step is to get rid of any spaces that don’t break the flow of text. After that, the data is split into lines, and each line goes through sentence-level cleaning to get rid of characters which aren’t needed. Then, abbreviations are explained in full, and any formatting problems are fixed. The main content is taken out, and checks make sure the data is correct. The final preprocessed data is gathered, which gives the next analyses a clean and solid base. All these steps are done using the python Regular Expressions.

πŸ”‘ Prerequisites

All the prerequisites are mentioned in the code file itself to run the code. But as the application is deployed on docker, you just need to pull the docker image from docker hub and run it on your local machine. The docker image is available on docker hub with the name chagantireddy/cjpr:latest. But the dependencies for the docker image are mentioned in the requirements.txt file.

πŸ’‘ Recommendations

The recommendations are provided based on the cosine similarity between the given case description and the historical cases. The cosine similarity is calculated using the

$$ Cosine Similarity(A,B) = \frac{\sum*{i=1} A_i . B_i}{\sqrt{\sum*{i=1} {Ai}^2}\sqrt{\sum{i=1} {B_i}^2}} $$

Where,

$A_i$ is the $i_{th}$ component of vector $A$ & $B_i$ is the $i_{th}$ component of vector $B$.

πŸ“‚ Directory Tree

.
β”œβ”€β”€ assets
β”œβ”€β”€ CJPR_docker
β”œβ”€β”€ Classical
β”‚Β Β  β”œβ”€β”€ Logistic
β”‚Β Β  β”œβ”€β”€ Random_Forest
β”‚Β Β  └── XGBOOST
β”œβ”€β”€ Papers
β”œβ”€β”€ test_cases
β”œβ”€β”€ TPU
β”‚Β Β  β”œβ”€β”€ albert
β”‚Β Β  β”œβ”€β”€ bert
β”‚Β Β  β”œβ”€β”€ deberta
β”‚Β Β  β”œβ”€β”€ distilbert
β”‚Β Β  β”œβ”€β”€ roberta
β”‚Β Β  └── xlnet
└── Transformers-GPU
    β”œβ”€β”€ albert
    β”œβ”€β”€ bert
    β”œβ”€β”€ deberta
    β”œβ”€β”€ distilbert
    β”œβ”€β”€ roberta
    └── xlnet

23 directories

πŸš€Β  Installation & Running

  1. Pull the docker image from docker hub
$ docker pull chagantireddy/cjpr:latest
  1. All the instructions to run the docker image are mentioned in the dockerhub itself for referencing purpose. But the instructions are also mentioned below.

  2. If you running the image for the first time then run the following command to create a container from the image.

$ docker run -it --name CJPR <IMAGE_ID>
  1. Get the Image ID using below command and then find for chagantireddy:cjpr and copy the IMAGE ID
$ docker images
  1. If you have already created a container from the image then you have to copy the test data to the container from test_cases directory. For that run the following command.
$ docker cp <file_path> CJPR:/app/test
  1. Now you can run the following command to run the application.
$ docker start CJPR

$ docker attach CJPR
  1. Now your output is stored in the container itself. You can copy the output to your local machine by running the following command.
$ docker cp CJPR:/app/recommanded_petitions <output_path>

The screenshot of the application running looks like:

Result 1

πŸ”‘ Results

The CJPR system is able to predict and recommend the test_cases which are not trained on the model. The results are shown below:

Result 2

Now you can copy the results to your local machine by running the above given command.

πŸ“‚ Ouput Dirtectory List

.
β”œβ”€β”€ Petition0.txt
β”œβ”€β”€ Petition1.txt
β”œβ”€β”€ Petition2.txt
β”œβ”€β”€ Petition3.txt
β”œβ”€β”€ Petition4.txt
β”œβ”€β”€ Petition5.txt
β”œβ”€β”€ Petition6.txt
β”œβ”€β”€ Petition7.txt
β”œβ”€β”€ Petition8.txt
β”œβ”€β”€ Petition9.txt
└── result_table.csv

0 directories, 11 files

πŸ““ Wandb

The results of the model are stored in the Wandb for better visualization and tracking of the model. Due to its better monitoring and tracking features, we have used wandb to store the results of the model.

πŸ‘ And it's done!

Feel free to πŸ“§ me for any doubts/query (Mail to Me πŸ˜„)


πŸ”° Future Goals

  1. To make the system more robust by adding more historical cases.
  2. To use the Machine Learning based encoding techniques to encode the case description.
  3. To make the system more efficient by adding more models.
  4. To make the system more user friendly by adding a GUI.

πŸ‘€ License

Apache-2.0 Β© Chaganti Reddy

Contributors