IndicLaw-Class: Code-Mixed Legal Intent Classifier

IndicLaw-Class is a lightweight multilingual transformer-based classifier that identifies legal intent from code-mixed Indian queries (e.g., Kannada-English, Hinglish). It is fine-tuned on citizen-style queries for real-world legal triage applications.

Model Overview

Architecture: distilbert-base-multilingual-cased
Task: Multi-class text classification (6 legal categories)
Input Style: Informal, code-mixed queries like:
- divorce file maadbeku without husband consent
- builder flat delay case haakbeku
- rent refund maadbeku, owner refusing

Legal Categories

The model classifies input into one of the following categories:

Label	Description
Family Law	Divorce, custody, alimony, marriage
Property Law	Inheritance, land disputes, transfer
Criminal Law	FIRs, police misconduct, assault
Consumer Complaints	E-commerce, refund issues, builders
Rent & Tenancy	Eviction, deposit disputes, lease
Public Services	Certificates, ID updates, ration

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: More information needed Hours used: More information needed Cloud Provider: More information needed Compute Region: More information needed Carbon Emitted: More information needed

Citation

@misc{nishanth_prakash_2025,
    author       = { nishanth prakash },
    title        = { IndicLaw-Class (Revision 87ae96e) },
    year         = 2025,
    url          = { https://huggingface.co/nprak26/IndicLaw-Class },
    doi          = { 10.57967/hf/5964 },
    publisher    = { Hugging Face }
}

How to Get Started With the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

# Load model and tokenizer from your local folder
model_dir = "./indiclaw-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)

# Load label map (from labels.txt you saved earlier)
label_map = {}
with open(f"{model_dir}/labels.txt", "r") as f:
    for line in f:
        idx, label = line.strip().split("\t")
        label_map[int(idx)] = label

# Create pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Test inputs
examples = [
    "wife divorce file maadbeku",
    "flat possession delay aadmele builder case file madbeku",
    "tenant evict maadbeku no notice"
]

# Run predictions
for text in examples:
    result = classifier(text)[0]
    label_str = result["label"]
    if "label" in label_str.lower():
      label_id = int(label_str.split("_")[-1])
    else:
      label_id = int(label_str)
    label_name = label_map[label_id]
    print(f"Input: {text}\nPredicted: {label_name} (confidence: {result['score']:.2f})\n")


---