Financial classifiers
Collection
Our various classifiers
•
2 items
•
Updated
This repository contains a classifier for determining whether a document is finance-related.
Snowflake/snowflake-arctic-embed-m
as the embedding model with a classification head. During the training, we train the model in a regression way.Qwen/Qwen2.5-72B-Instruct
to annotate 110k CulturaX documents with a note between 0 and 5, for the training, scores between [0,2] are converted to 0, [3,5] to 1. Then trained on 108k and test on 2k.from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("LinguaCustodia/ClassiFin")
model = AutoModelForSequenceClassification.from_pretrained("LinguaCustodia/ClassiFin")
# Example text
text = "This is a test sentence."
# Tokenize input
inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
# Get model outputs
outputs = model(**inputs)
logits = outputs.logits.float().detach().cpu().numpy()
logits = logits.ravel().tolist()
# Convert logits to class labels
int_scores = [int(round(max(0, min(logit, 1)))) for logit in logits] # 0 for non-financial, 1 for financial
precision recall f1-score support
0 0.95 0.99 0.97 1750
1 0.92 0.62 0.74 250
accuracy 0.95 2000
macro avg 0.93 0.81 0.85 2000
weighted avg 0.94 0.95 0.94 2000
If you use this model in your research or applications, please cite this repository.
@misc{ClassiFin,
title={ClassiFin: Finance Document Classifier},
author={Liu, Jingshu and Qader, Raheel and Caillaut, Gaëtan and Nakhle, Mariam and Barthelemy, Jean-Gabriel and Sadoune, Arezki and Foly, Sabine},
url={https://huggingface.co/LinguaCustodia/ClassiFin},
year={2025}
}