Update README.md
Browse files
README.md
CHANGED
@@ -23,11 +23,11 @@ tags:
|
|
23 |
- a100
|
24 |
- cc-by-nc-3.0
|
25 |
---
|
26 |
-
#
|
27 |
|
28 |
## Model Overview
|
29 |
|
30 |
-
**
|
31 |
|
32 |
The model was trained on a Spanish dataset (`final.parquet`) containing 8,850 samples across 109 document type classes. It addresses class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) applied to the training set, ensuring robust performance on minority classes. The fine-tuning process achieved an evaluation F1-score of **0.4855**, accuracy of **0.6096**, precision of **0.5212**, and recall of **0.5006** on a validation set of 1,770 samples.
|
33 |
|
@@ -91,7 +91,7 @@ import torch
|
|
91 |
import numpy as np
|
92 |
|
93 |
# Load the model and tokenizer
|
94 |
-
model_path = "
|
95 |
tokenizer = LongformerTokenizer.from_pretrained(model_path)
|
96 |
model = LongformerForSequenceClassification.from_pretrained(model_path)
|
97 |
|
@@ -141,21 +141,21 @@ print(f"Predicted document type code: {predicted_label}")
|
|
141 |
- **Hardware Requirements**: Inference on CPU is possible but slower; a GPU is recommended for efficiency.
|
142 |
|
143 |
## License
|
144 |
-
This model is licensed under the **Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)** license. You are free to share and adapt the model for non-commercial purposes, provided appropriate credit is given to
|
145 |
|
146 |
## Author
|
147 |
-
- **Organization**:
|
148 |
-
- **Contact**: Reach out via Hugging Face (https://huggingface.co/
|
149 |
|
150 |
## Citation
|
151 |
If you use this model in your work, please cite:
|
152 |
```
|
153 |
-
@misc{
|
154 |
-
author = {
|
155 |
title = {Classifier SGD Longformer 4099: A Fine-Tuned Model for Spanish Document Type Classification},
|
156 |
year = {2025},
|
157 |
publisher = {Hugging Face},
|
158 |
-
url = {https://huggingface.co/
|
159 |
}
|
160 |
```
|
161 |
|
|
|
23 |
- a100
|
24 |
- cc-by-nc-3.0
|
25 |
---
|
26 |
+
# Excribe Classifier SGD Longformer 4096
|
27 |
|
28 |
## Model Overview
|
29 |
|
30 |
+
**Excribe/Classifier_SGD_Longformer_4099** is a fine-tuned version of the `allenai/longformer-base-4096` model, designed for text classification tasks in document management, specifically for classifying Spanish-language input documents into document type categories (`tipo_documento_codigo`). Developed by **Excribe.co**, this model leverages the Longformer architecture to handle long texts (up to 4096 tokens) and is optimized for GPU environments, such as NVIDIA A100.
|
31 |
|
32 |
The model was trained on a Spanish dataset (`final.parquet`) containing 8,850 samples across 109 document type classes. It addresses class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) applied to the training set, ensuring robust performance on minority classes. The fine-tuning process achieved an evaluation F1-score of **0.4855**, accuracy of **0.6096**, precision of **0.5212**, and recall of **0.5006** on a validation set of 1,770 samples.
|
33 |
|
|
|
91 |
import numpy as np
|
92 |
|
93 |
# Load the model and tokenizer
|
94 |
+
model_path = "excribe/classifier_sgd_longformer_4099"
|
95 |
tokenizer = LongformerTokenizer.from_pretrained(model_path)
|
96 |
model = LongformerForSequenceClassification.from_pretrained(model_path)
|
97 |
|
|
|
141 |
- **Hardware Requirements**: Inference on CPU is possible but slower; a GPU is recommended for efficiency.
|
142 |
|
143 |
## License
|
144 |
+
This model is licensed under the **Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)** license. You are free to share and adapt the model for non-commercial purposes, provided appropriate credit is given to Excribe.co.
|
145 |
|
146 |
## Author
|
147 |
+
- **Organization**: Excribe.co
|
148 |
+
- **Contact**: Reach out via Hugging Face (https://huggingface.co/excribe)
|
149 |
|
150 |
## Citation
|
151 |
If you use this model in your work, please cite:
|
152 |
```
|
153 |
+
@misc{excribe_classifier_sgd_longformer_4099,
|
154 |
+
author = {Excribe.co},
|
155 |
title = {Classifier SGD Longformer 4099: A Fine-Tuned Model for Spanish Document Type Classification},
|
156 |
year = {2025},
|
157 |
publisher = {Hugging Face},
|
158 |
+
url = {https://huggingface.co/excribe/classifier_sgd_longformer_4099}
|
159 |
}
|
160 |
```
|
161 |
|