where do you see f1 score of 0.89 ?
Gerald Stanje
Gerald001
·
AI & ML interests
None yet
Recent Activity
new activity
27 days ago
microsoft/bitnet-b1.58-2B-4T:Will the fine-tuning code be provided?
new activity
about 1 month ago
answerdotai/ModernBERT-base:sagemaker not supporting modernBERT trained model with transformers 4.49.0
new activity
about 1 month ago
answerdotai/ModernBERT-base:gpu requirements
Organizations
Gerald001's activity
Will the fine-tuning code be provided?
2
4
#6 opened 27 days ago
by
AXCXEPT

sagemaker not supporting modernBERT trained model with transformers 4.49.0
5
#69 opened 3 months ago
by
devs9
gpu requirements
1
#73 opened 2 months ago
by
Gerald001
fine tune model and convert to onnx
4
#77 opened about 2 months ago
by
Gerald001
commented on
Fine-tune ModernBERT for text classification using synthetic data
about 2 months ago
commented on
Fine-tune ModernBERT for text classification using synthetic data
about 2 months ago
hi @davidberenstein1957 , this code seems not to work for transformers: 4.49.0. any idea? i see eval_f1 is 0.007867705980913528...
i get this output:
python3 train4.py
Parameter 'function'=<function tokenize at 0x7fec4c3b6b90> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 900/900 [00:00<00:00, 2251.77 examples/s]
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 2668.59 examples/s]
Some weights of ModernBertForSequenceClassification were not initialized from the model checkpoint at answerdotai/ModernBERT-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4.115044247787611e-05, 'epoch': 0.88}
{'eval_loss': nan, 'eval_f1': 0.007867705980913528, 'eval_runtime': 11.5539, 'eval_samples_per_second': 8.655, 'eval_steps_per_second': 1.125, 'epoch': 1.0}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 3.230088495575221e-05, 'epoch': 1.77}
{'eval_loss': nan, 'eval_f1': 0.007867705980913528, 'eval_runtime': 0.3503, 'eval_samples_per_second': 285.465, 'eval_steps_per_second': 37.11, 'epoch': 2.0}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2.345132743362832e-05, 'epoch': 2.65}
{'eval_loss': nan, 'eval_f1': 0.007867705980913528, 'eval_runtime': 0.3496, 'eval_samples_per_second': 286.027, 'eval_steps_per_second': 37.184, 'epoch': 3.0}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 1.4601769911504426e-05, 'epoch': 3.54}
{'eval_loss': nan, 'eval_f1': 0.007867705980913528, 'eval_runtime': 0.3529, 'eval_samples_per_second': 283.348, 'eval_steps_per_second': 36.835, 'epoch': 4.0}
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 5.752212389380531e-06, 'epoch': 4.42}
{'eval_loss': nan, 'eval_f1': 0.007867705980913528, 'eval_runtime': 0.3147, 'eval_samples_per_second': 317.753, 'eval_steps_per_second': 41.308, 'epoch': 5.0}
{'train_runtime': 149.6166, 'train_samples_per_second': 30.077, 'train_steps_per_second': 3.776, 'train_loss': 0.0, 'epoch': 5.0}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 565/565 [02:29<00:00, 3.78it/s]
Device set to use cuda:0
[2025-03-26 15:16:13,918] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8)
[2025-03-26 15:16:13,918] torch._dynamo.convert_frame: [WARNING] function: 'compiled_mlp' (/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/modernbert/modeling_modernbert.py:552)
[2025-03-26 15:16:13,918] torch._dynamo.convert_frame: [WARNING] last reason: ___check_global_state()
[2025-03-26 15:16:13,918] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[2025-03-26 15:16:13,918] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html.
[{'label': 'business-and-industrial', 'score': nan}]
full code:
from datasets import load_dataset
from datasets.arrow_dataset import Dataset
from datasets.dataset_dict import DatasetDict, IterableDatasetDict
from datasets.iterable_dataset import IterableDataset
import os
import torch
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
torch.set_float32_matmul_precision('high')
# Dataset id from huggingface.co/dataset
dataset_id = "argilla/synthetic-domain-text-classification"
# Load raw dataset
train_dataset = load_dataset(dataset_id, split='train')
split_dataset = train_dataset.train_test_split(test_size=0.1)
split_dataset['train'][0]
# {'text': 'Recently, there has been an increase in property values within the suburban areas of several cities due to improvements in infrastructure and lifestyle amenities such as parks, retail stores, and educational institutions nearby. Additionally, new housing developments are emerging, catering to different family needs with varying sizes and price ranges. These changes have influenced investment decisions for many looking to buy or sell properties.', 'label': 14}
from transformers import AutoTokenizer
# Model id to load the tokenizer
model_id = "answerdotai/ModernBERT-base"
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Tokenize helper function
def tokenize(batch):
return tokenizer(batch['text'], padding=True, truncation=True, return_tensors="pt")
# Tokenize dataset
if "label" in split_dataset["train"].features.keys():
split_dataset = split_dataset.rename_column("label", "labels") # to match Trainer
tokenized_dataset = split_dataset.map(tokenize, batched=True, remove_columns=["text"])
tokenized_dataset["train"].features.keys()
# dict_keys(['labels', 'input_ids', 'attention_mask'])
from transformers import AutoModelForSequenceClassification
# Model id to load the tokenizer
model_id = "answerdotai/ModernBERT-base"
# Prepare model labels - useful for inference
labels = tokenized_dataset["train"].features["labels"].names
num_labels = len(labels)
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
label2id[label] = str(i)
id2label[str(i)] = label
# Download the model from huggingface.co/models
model = AutoModelForSequenceClassification.from_pretrained(
model_id, num_labels=num_labels, label2id=label2id, id2label=id2label,
)
import numpy as np
from sklearn.metrics import f1_score
# Metric helper method
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
score = f1_score(
labels, predictions, labels=labels, pos_label=1, average="weighted"
)
return {"f1": float(score) if score == 1 else score}
from huggingface_hub import HfFolder
from transformers import Trainer, TrainingArguments
# Define training args
training_args = TrainingArguments(
output_dir = "ModernBERT-domain-classifier",
per_device_train_batch_size=8,#32,
per_device_eval_batch_size=8,#16,
learning_rate=5e-5,
num_train_epochs=5,
bf16=True, # bfloat16 training
optim="adamw_torch_fused", # improved optimizer
# logging & evaluation strategies
logging_strategy="steps",
logging_steps=100,
eval_strategy="epoch",
save_strategy="epoch",
save_total_limit=2,
load_best_model_at_end=True,
#use_mps_device=True,
metric_for_best_model="f1",
# push to hub parameters
push_to_hub=False,
hub_strategy="every_save",
hub_token=HfFolder.get_token(),
)
# Create a Trainer instance
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["test"],
compute_metrics=compute_metrics,
)
trainer.train()
# {'train_runtime': 3642.7783, 'train_samples_per_second': 1.235, 'train_steps_per_second': 0.04, 'train_loss': 0.535627057634551, 'epoch': 5.0}
from transformers import pipeline
model_save_path = "ModernBERT-domain-classifier-save"
trainer.save_model(model_save_path)
# Save processor and create model card
tokenizer.save_pretrained(model_save_path)
# load model from huggingface.co/models using our repository id
classifier = pipeline(
task="text-classification",
model=model_save_path,
device=0,
)
sample = "Smoking is bad for your health."
print(classifier(sample))
# [{'label': 'health', 'score': 0.6779336333274841}]
How to get the probability score from Llama-Guard
5
6
#16 opened 12 months ago
by
ctdfuji
export fine tuned model to onnx
1
#76 opened 10 months ago
by
Gerald001
classification probability
4
#14 opened 10 months ago
by
Gerald001
Onnx model doesn't produce embeddings close enough to SentenceTransformer version
6
#67 opened 11 months ago
by
luciancap001
Upload ONNX weights exported via optimum with `library='sentence-transformers'`
2
#64 opened 12 months ago
by
Xenova

Adding ONNX file of this model
1
1
#40 opened over 1 year ago
by
TDK2434
Upload model.onnx
2
1
#19 opened almost 2 years ago
by
tkelmATlegends
GPU requirements
10
#29 opened about 1 year ago
by
Gerald001
Update generation_config.json
35
14
#4 opened about 1 year ago
by
abhi-db
how to output an answer without side chatter
1
8
#36 opened about 1 year ago
by
Gerald001
`meta-llama/Meta-Llama-3-8B-Instruct` model with sagemaker
1
#38 opened about 1 year ago
by
aak7912
Update tokenizer_config.json to prepend the bos token
5
7
#35 opened about 1 year ago
by
eduagarcia

models for inf2.
5
#33 opened about 1 year ago
by
AC2132