BGE base ArgillaSDK Matryoshka
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sud-962081/bge-base-argilla-sdk-matryoshka")
# Run inference
sentences = [
'Make changes and push them\n\nMake the changes you want in your local repository, and test that everything works and you are following the guidelines. Check the documentation for more information about the development.\n\nOnce you have finished, you can check the status of your repository and synchronize with the upstreaming repo with the following command:\n\n```sh\n\nCheck the status of your repository\n\ngit status\n\nSynchronize with the upstreaming repo',
'Are changes required to be made and then uploaded to the Argilla dataset repository?',
'The beautiful scenery of the Italian town Argilla made me want to make changes to my travel plans.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768
,dim_512
,dim_256
,dim_128
anddim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
---|---|---|---|---|---|
cosine_accuracy@1 | 0.0612 | 0.0714 | 0.0408 | 0.0306 | 0.0204 |
cosine_accuracy@3 | 0.1837 | 0.1837 | 0.2041 | 0.1939 | 0.0816 |
cosine_accuracy@5 | 0.2653 | 0.2551 | 0.2551 | 0.2449 | 0.2143 |
cosine_accuracy@10 | 0.2959 | 0.3061 | 0.2959 | 0.3776 | 0.2755 |
cosine_precision@1 | 0.0612 | 0.0714 | 0.0408 | 0.0306 | 0.0204 |
cosine_precision@3 | 0.0612 | 0.0612 | 0.068 | 0.0646 | 0.0272 |
cosine_precision@5 | 0.0531 | 0.051 | 0.051 | 0.049 | 0.0429 |
cosine_precision@10 | 0.0296 | 0.0306 | 0.0296 | 0.0378 | 0.0276 |
cosine_recall@1 | 0.0612 | 0.0714 | 0.0408 | 0.0306 | 0.0204 |
cosine_recall@3 | 0.1837 | 0.1837 | 0.2041 | 0.1939 | 0.0816 |
cosine_recall@5 | 0.2653 | 0.2551 | 0.2551 | 0.2449 | 0.2143 |
cosine_recall@10 | 0.2959 | 0.3061 | 0.2959 | 0.3776 | 0.2755 |
cosine_ndcg@10 | 0.1788 | 0.1789 | 0.1636 | 0.1845 | 0.132 |
cosine_mrr@10 | 0.1409 | 0.1389 | 0.1211 | 0.1252 | 0.0874 |
cosine_map@100 | 0.154 | 0.1499 | 0.1349 | 0.133 | 0.1001 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 882 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 882 samples:
anchor positive negative type string string string details - min: 6 tokens
- mean: 91.86 tokens
- max: 198 tokens
- min: 8 tokens
- mean: 25.62 tokens
- max: 91 tokens
- min: 10 tokens
- mean: 22.11 tokens
- max: 61 tokens
- Samples:
anchor positive negative workspace = client.workspaces("my_workspace")
Retrieve the dataset from the first workspace
retrieved_dataset = client.datasets(name="my_dataset")
Retrieve the dataset from the specified workspace
retrieved_dataset = client.datasets(name="my_dataset", workspace=workspace)<br><br>Check dataset existence<br><br>You can check if a dataset exists by calling the exists method on the Dataset class. This method returns a boolean value.<br><br>
python
import argilla_sdk as rgIs there a way to download a dataset from a specific workspace using the Argilla client for my data annotation task?
The new coffee shop in town offers a variety of workspace options for remote workers.
=== "As Record objects"
You can also add suggestions to a record in an initializedRecord` object.
=== "From a generic data structure"
You can add suggestions as a dictionary, where the keys correspond to the names of the labels that were configured for your dataset. Remember that you can also use the mapping parameter to specify the data structure.Is it possible to associate multiple suggestions with a single record object in Argilla?
I love adding suggestions to my garden to make it look more beautiful.
hide: footer
rg.Argilla
To interact with the Argilla server from python you can use the Argilla class. The Argilla client is used to create, get, update, and delete all Argilla resources, such as workspaces, users, datasets, and records.
Usage Examples
Connecting to an Argilla server
To connect to an Argilla server, instantiate the Argilla class and pass the api_url of the server and the api_key to authenticate.
```python
import argilla_sdk as rgDoes the Argilla class provide a convenient way to handle dataset and record administration tasks on the Argilla server?
The tourists got lost in the Argilla desert because they forgot to bring a map.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "TripletLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_eval_batch_size
: 4gradient_accumulation_steps
: 4learning_rate
: 2e-05lr_scheduler_type
: cosinewarmup_ratio
: 0.1load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 8per_device_eval_batch_size
: 4per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 4eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0 | 0 | - | 0.3815 | 0.3810 | 0.3717 | 0.3897 | 0.3153 |
0.1802 | 5 | 23.2127 | - | - | - | - | - |
0.3604 | 10 | 22.567 | - | - | - | - | - |
0.5405 | 15 | 21.0403 | - | - | - | - | - |
0.7207 | 20 | 19.6983 | - | - | - | - | - |
0.9009 | 25 | 18.4465 | - | - | - | - | - |
0.973 | 27 | - | 0.2707 | 0.2832 | 0.2721 | 0.2576 | 0.238 |
1.1081 | 30 | 19.4241 | - | - | - | - | - |
1.2883 | 35 | 17.3167 | - | - | - | - | - |
1.4685 | 40 | 17.0334 | - | - | - | - | - |
1.6486 | 45 | 16.9455 | - | - | - | - | - |
1.8288 | 50 | 16.8353 | - | - | - | - | - |
1.9730 | 54 | - | 0.1507 | 0.1536 | 0.1595 | 0.1604 | 0.1532 |
2.0360 | 55 | 18.4414 | - | - | - | - | - |
2.2162 | 60 | 16.7065 | - | - | - | - | - |
2.3964 | 65 | 16.6709 | - | - | - | - | - |
2.5766 | 70 | 16.6449 | - | - | - | - | - |
2.7568 | 75 | 16.6349 | - | - | - | - | - |
2.9369 | 80 | 16.633 | - | - | - | - | - |
2.9730 | 81 | - | 0.1788 | 0.1789 | 0.1636 | 0.1845 | 0.1320 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.3.1
- Transformers: 4.47.1
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for sud-962081/bge-base-argilla-sdk-matryoshka
Base model
BAAI/bge-base-en-v1.5Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.061
- Cosine Accuracy@3 on dim 768self-reported0.184
- Cosine Accuracy@5 on dim 768self-reported0.265
- Cosine Accuracy@10 on dim 768self-reported0.296
- Cosine Precision@1 on dim 768self-reported0.061
- Cosine Precision@3 on dim 768self-reported0.061
- Cosine Precision@5 on dim 768self-reported0.053
- Cosine Precision@10 on dim 768self-reported0.030
- Cosine Recall@1 on dim 768self-reported0.061
- Cosine Recall@3 on dim 768self-reported0.184