---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:46957
- loss:TripletLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: How to load documents?
  sentences:
  - MapCity contains the geometries that are displayed on the interactive map on the
    frontend.
  - The maps app contains State, Region, Province, Company, City, Particella, Map,
    MapCity, Property, Group, PropertyHasMap, Palette, PaletteColor, and Geotiff models.
  - Use the load_documents command which creates document file instances from folders
    in ./files/2-Database-solare path.
- source_sentence: What is the MapCity model?
  sentences:
  - The document app contains Document, DocumentFile, Type, Language, Theme, Keyword,
    and Oss models used in the document consultation section.
  - Document contains all the document metadata such as name, author, year, type,
    language used in the document consultation section.
  - MapCity contains the geometries that are displayed on the interactive map on the
    frontend.
- source_sentence: What is the cleantables command?
  sentences:
  - Takes care of eliminating all instances of Palette, Group, MapCity, Map, Province,
    and Property models.
  - Set CORS_ALLOWED_ORIGINS in the environment file with allowed origins like localhost,127.0.0.1,http://localhost:3000.
  - 'from matplotlib import pyplot as plt


    colors = [''Accent'', ''Accent_r'', ''Blues'', ''Blues_r'', ''BrBG'', ''BrBG_r'',]

    ax = res_union. plot(cmap=colors[random. randint(0, len(colors))])

    ax = res_union. plot(cmap=''Greens_r'')

    gdf1. plot(ax=ax, facecolor=''none'', edgecolor=''k'')

    gdf2. plot(ax=ax, facecolor=''none'', edgecolor=''k'')

    plt. savefig("overlay. png")

    ```.'
- source_sentence: How to restore a database dump?
  sentences:
  - Use the generategeotiff command which generates Geotiff instances from a shapefile.
    Run with python manage.py generategeotiff <path>.
  - Use the generategeotiff command which generates Geotiff instances from a shapefile.
    Run with python manage.py generategeotiff <path>.
  - Copy the dump file to data/postgresql folder, then inside the database container
    run pg_restore -U $POSTGRES_USER -d $POSTGRES_DB --clean --if-exists /var/lib/postgresql/data/db_backup.dump
- source_sentence: What is the State model?
  sentences:
  - State contains the geometries of the states, in our specific case it contains
    only the entire geometries of the Italian state.
  - Use the load_documents command which creates document file instances from folders
    in ./files/2-Database-solare path.
  - The maps app contains State, Region, Province, Company, City, Particella, Map,
    MapCity, Property, Group, PropertyHasMap, Palette, PaletteColor, and Geotiff models.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
  results:
  - task:
      type: triplet
      name: Triplet
    dataset:
      name: val triplet eval
      type: val-triplet-eval
    metrics:
    - type: cosine_accuracy
      value: 1.0
      name: Cosine Accuracy
---

# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("gabrielegabellone/all-mini-mediterraneo-triplets-v4")
# Run inference
sentences = [
    'What is the State model?',
    'State contains the geometries of the states, in our specific case it contains only the entire geometries of the Italian state.',
    'The maps app contains State, Region, Province, Company, City, Particella, Map, MapCity, Property, Group, PropertyHasMap, Palette, PaletteColor, and Geotiff models.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Triplet

* Dataset: `val-triplet-eval`
* Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)

| Metric              | Value   |
|:--------------------|:--------|
| **cosine_accuracy** | **1.0** |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 46,957 training samples
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence_0                                                                       | sentence_1                                                                        | sentence_2                                                                          |
  |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
  | type    | string                                                                           | string                                                                            | string                                                                              |
  | details | <ul><li>min: 7 tokens</li><li>mean: 9.62 tokens</li><li>max: 15 tokens</li></ul> | <ul><li>min: 21 tokens</li><li>mean: 35.4 tokens</li><li>max: 68 tokens</li></ul> | <ul><li>min: 21 tokens</li><li>mean: 73.85 tokens</li><li>max: 239 tokens</li></ul> |
* Samples:
  | sentence_0                                        | sentence_1                                                                                                                                                                                                 | sentence_2                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
  |:--------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>How to restore a database dump?</code>      | <code>Copy the dump file to data/postgresql folder, then inside the database container run pg_restore -U $POSTGRES_USER -d $POSTGRES_DB --clean --if-exists /var/lib/postgresql/data/db_backup.dump</code> | <code>filter(id__in=ids_dataframe1)<br>           ids_dataframe2 = df2. split(',')<br>           maps = Map. objects. filter(id__in=ids_dataframe2)<br>           if not provinces or not maps:<br>               return Response('Provinces or maps not found', status=status. HTTP_404_NOT_FOUND)<br>   ```<br><br>2.  Then we use the **Geodataframe.</code>                                                                                                 |
  | <code>What is the Region model?</code>            | <code>Region contains the geometries of the regions, in our specific case it only contains the geometries of the Italian regions.</code>                                                                   | <code>The command allows loading data into the project based on a compiled excel file.   <br>Allows loading data on *scenarios*, *shapefiles*, *palettes*, *software*, *particles* and *companies*. <br>1.  Run the command::<br><br>    ```bash<br>    python manage. py flow<br>    ```<br><br>2.  Choose the type of data to load:<br>    ```<br>    Executing consistency checks. <br>    Load scenarios data.  (y/n): n<br>    Load softwares data.</code> |
  | <code>What is the generategeotiff command?</code> | <code>This command generates Geotiff instances from a shapefile. For each Property present in the shapefile, a Geotiff instance will be created.</code>                                                    | <code>MINIO_ROOT_USER=minio12345<br>MINIO_ROOT_PASSWORD=minio12345<br>MINIO_ENDPOINT=minio:9000<br>MINIO_EXTERNAL_ENDPOINT=localhost:9000  #CDN<br>MINIO_USE_HTTPS=False<br>MINIO_EXTERNAL_ENDPOINT_USE_HTTPS=False #true online<br><br>PGADMIN_DEFAULT_EMAIL=admin@admin. com<br>PGADMIN_DEFAULT_PASSWORD=strongpassword<br><br>VERSION=1. 0. 0<br><br>SHAPEFILE_VERSION=gadm41_ITA_<br>```.</code>                                                            |
* Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
  ```json
  {
      "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
      "triplet_margin": 5
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 4
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 4
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `hub_revision`: None
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `liger_kernel_config`: None
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Training Logs
| Epoch  | Step  | Training Loss | val-triplet-eval_cosine_accuracy |
|:------:|:-----:|:-------------:|:--------------------------------:|
| 0.1704 | 500   | 4.5287        | -                                |
| 0.3407 | 1000  | 4.1121        | 1.0                              |
| 0.5111 | 1500  | 3.7883        | -                                |
| 0.6814 | 2000  | 3.6668        | 1.0                              |
| 0.8518 | 2500  | 3.6262        | -                                |
| 1.0    | 2935  | -             | 1.0                              |
| 1.0221 | 3000  | 3.586         | 1.0                              |
| 1.1925 | 3500  | 3.5752        | -                                |
| 1.3629 | 4000  | 3.5576        | 1.0                              |
| 1.5332 | 4500  | 3.556         | -                                |
| 1.7036 | 5000  | 3.5389        | 1.0                              |
| 1.8739 | 5500  | 3.526         | -                                |
| 2.0    | 5870  | -             | 1.0                              |
| 2.0443 | 6000  | 3.5228        | 1.0                              |
| 2.2147 | 6500  | 3.5234        | -                                |
| 2.3850 | 7000  | 3.5122        | 1.0                              |
| 2.5554 | 7500  | 3.517         | -                                |
| 2.7257 | 8000  | 3.5056        | 1.0                              |
| 2.8961 | 8500  | 3.5103        | -                                |
| 3.0    | 8805  | -             | 1.0                              |
| 3.0664 | 9000  | 3.5071        | 1.0                              |
| 3.2368 | 9500  | 3.4977        | -                                |
| 3.4072 | 10000 | 3.4929        | 1.0                              |
| 3.5775 | 10500 | 3.4964        | -                                |
| 3.7479 | 11000 | 3.4914        | 1.0                              |
| 3.9182 | 11500 | 3.491         | -                                |
| 4.0    | 11740 | -             | 1.0                              |


### Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.53.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.8.1
- Datasets: 4.0.0
- Tokenizers: 0.21.2

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### TripletLoss
```bibtex
@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->