Update README.md

Browse files

Files changed (1) hide show

README.md +9 -9

README.md CHANGED Viewed

@@ -23,11 +23,11 @@ tags:
 - a100
 - cc-by-nc-3.0
 ---
-# Exscribe Classifier SGD Longformer 4096
 ## Model Overview
-**Exscribe/Classifier_SGD_Longformer_4099** is a fine-tuned version of the `allenai/longformer-base-4096` model, designed for text classification tasks in document management, specifically for classifying Spanish-language input documents into document type categories (`tipo_documento_codigo`). Developed by **Exscribe.co**, this model leverages the Longformer architecture to handle long texts (up to 4096 tokens) and is optimized for GPU environments, such as NVIDIA A100.
 The model was trained on a Spanish dataset (`final.parquet`) containing 8,850 samples across 109 document type classes. It addresses class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) applied to the training set, ensuring robust performance on minority classes. The fine-tuning process achieved an evaluation F1-score of **0.4855**, accuracy of **0.6096**, precision of **0.5212**, and recall of **0.5006** on a validation set of 1,770 samples.
@@ -91,7 +91,7 @@ import torch
 import numpy as np
 # Load the model and tokenizer
-model_path = "exscribe/classifier_sgd_longformer_4099"
 tokenizer = LongformerTokenizer.from_pretrained(model_path)
 model = LongformerForSequenceClassification.from_pretrained(model_path)
@@ -141,21 +141,21 @@ print(f"Predicted document type code: {predicted_label}")
 - **Hardware Requirements**: Inference on CPU is possible but slower; a GPU is recommended for efficiency.
 ## License
-This model is licensed under the **Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)** license. You are free to share and adapt the model for non-commercial purposes, provided appropriate credit is given to Exscribe.co.
 ## Author
-- **Organization**: Exscribe.co
-- **Contact**: Reach out via Hugging Face (https://huggingface.co/exscribe)
 ## Citation
 If you use this model in your work, please cite:
 ```
-@misc{exscribe_classifier_sgd_longformer_4099,
-  author = {Exscribe.co},
   title = {Classifier SGD Longformer 4099: A Fine-Tuned Model for Spanish Document Type Classification},
   year = {2025},
   publisher = {Hugging Face},
-  url = {https://huggingface.co/exscribe/classifier_sgd_longformer_4099}
 }
 ```

 - a100
 - cc-by-nc-3.0
 ---
+# Excribe Classifier SGD Longformer 4096
 ## Model Overview
+**Excribe/Classifier_SGD_Longformer_4099** is a fine-tuned version of the `allenai/longformer-base-4096` model, designed for text classification tasks in document management, specifically for classifying Spanish-language input documents into document type categories (`tipo_documento_codigo`). Developed by **Excribe.co**, this model leverages the Longformer architecture to handle long texts (up to 4096 tokens) and is optimized for GPU environments, such as NVIDIA A100.
 The model was trained on a Spanish dataset (`final.parquet`) containing 8,850 samples across 109 document type classes. It addresses class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) applied to the training set, ensuring robust performance on minority classes. The fine-tuning process achieved an evaluation F1-score of **0.4855**, accuracy of **0.6096**, precision of **0.5212**, and recall of **0.5006** on a validation set of 1,770 samples.
 import numpy as np
 # Load the model and tokenizer
+model_path = "excribe/classifier_sgd_longformer_4099"
 tokenizer = LongformerTokenizer.from_pretrained(model_path)
 model = LongformerForSequenceClassification.from_pretrained(model_path)
 - **Hardware Requirements**: Inference on CPU is possible but slower; a GPU is recommended for efficiency.
 ## License
+This model is licensed under the **Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)** license. You are free to share and adapt the model for non-commercial purposes, provided appropriate credit is given to Excribe.co.
 ## Author
+- **Organization**: Excribe.co
+- **Contact**: Reach out via Hugging Face (https://huggingface.co/excribe)
 ## Citation
 If you use this model in your work, please cite:
 ```
+@misc{excribe_classifier_sgd_longformer_4099,
+  author = {Excribe.co},
   title = {Classifier SGD Longformer 4099: A Fine-Tuned Model for Spanish Document Type Classification},
   year = {2025},
   publisher = {Hugging Face},
+  url = {https://huggingface.co/excribe/classifier_sgd_longformer_4099}
 }
 ```