--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:46957 - loss:TripletLoss base_model: sentence-transformers/all-MiniLM-L6-v2 widget: - source_sentence: How to load documents? sentences: - MapCity contains the geometries that are displayed on the interactive map on the frontend. - The maps app contains State, Region, Province, Company, City, Particella, Map, MapCity, Property, Group, PropertyHasMap, Palette, PaletteColor, and Geotiff models. - Use the load_documents command which creates document file instances from folders in ./files/2-Database-solare path. - source_sentence: What is the MapCity model? sentences: - The document app contains Document, DocumentFile, Type, Language, Theme, Keyword, and Oss models used in the document consultation section. - Document contains all the document metadata such as name, author, year, type, language used in the document consultation section. - MapCity contains the geometries that are displayed on the interactive map on the frontend. - source_sentence: What is the cleantables command? sentences: - Takes care of eliminating all instances of Palette, Group, MapCity, Map, Province, and Property models. - Set CORS_ALLOWED_ORIGINS in the environment file with allowed origins like localhost,127.0.0.1,http://localhost:3000. - 'from matplotlib import pyplot as plt colors = [''Accent'', ''Accent_r'', ''Blues'', ''Blues_r'', ''BrBG'', ''BrBG_r'',] ax = res_union. plot(cmap=colors[random. randint(0, len(colors))]) ax = res_union. plot(cmap=''Greens_r'') gdf1. plot(ax=ax, facecolor=''none'', edgecolor=''k'') gdf2. plot(ax=ax, facecolor=''none'', edgecolor=''k'') plt. savefig("overlay. png") ```.' - source_sentence: How to restore a database dump? sentences: - Use the generategeotiff command which generates Geotiff instances from a shapefile. Run with python manage.py generategeotiff . - Use the generategeotiff command which generates Geotiff instances from a shapefile. Run with python manage.py generategeotiff . - Copy the dump file to data/postgresql folder, then inside the database container run pg_restore -U $POSTGRES_USER -d $POSTGRES_DB --clean --if-exists /var/lib/postgresql/data/db_backup.dump - source_sentence: What is the State model? sentences: - State contains the geometries of the states, in our specific case it contains only the entire geometries of the Italian state. - Use the load_documents command which creates document file instances from folders in ./files/2-Database-solare path. - The maps app contains State, Region, Province, Company, City, Particella, Map, MapCity, Property, Group, PropertyHasMap, Palette, PaletteColor, and Geotiff models. pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy model-index: - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2 results: - task: type: triplet name: Triplet dataset: name: val triplet eval type: val-triplet-eval metrics: - type: cosine_accuracy value: 1.0 name: Cosine Accuracy --- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - **Maximum Sequence Length:** 256 tokens - **Output Dimensionality:** 384 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("gabrielegabellone/all-mini-mediterraneo-triplets-v4") # Run inference sentences = [ 'What is the State model?', 'State contains the geometries of the states, in our specific case it contains only the entire geometries of the Italian state.', 'The maps app contains State, Region, Province, Company, City, Particella, Map, MapCity, Property, Group, PropertyHasMap, Palette, PaletteColor, and Geotiff models.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Triplet * Dataset: `val-triplet-eval` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:--------------------|:--------| | **cosine_accuracy** | **1.0** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 46,957 training samples * Columns: sentence_0, sentence_1, and sentence_2 * Approximate statistics based on the first 1000 samples: | | sentence_0 | sentence_1 | sentence_2 | |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | sentence_0 | sentence_1 | sentence_2 | |:--------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | How to restore a database dump? | Copy the dump file to data/postgresql folder, then inside the database container run pg_restore -U $POSTGRES_USER -d $POSTGRES_DB --clean --if-exists /var/lib/postgresql/data/db_backup.dump | filter(id__in=ids_dataframe1)
ids_dataframe2 = df2. split(',')
maps = Map. objects. filter(id__in=ids_dataframe2)
if not provinces or not maps:
return Response('Provinces or maps not found', status=status. HTTP_404_NOT_FOUND)
```

2. Then we use the **Geodataframe.
| | What is the Region model? | Region contains the geometries of the regions, in our specific case it only contains the geometries of the Italian regions. | The command allows loading data into the project based on a compiled excel file.
Allows loading data on *scenarios*, *shapefiles*, *palettes*, *software*, *particles* and *companies*.
1. Run the command::

```bash
python manage. py flow
```

2. Choose the type of data to load:
```
Executing consistency checks.
Load scenarios data. (y/n): n
Load softwares data.
| | What is the generategeotiff command? | This command generates Geotiff instances from a shapefile. For each Property present in the shapefile, a Geotiff instance will be created. | MINIO_ROOT_USER=minio12345
MINIO_ROOT_PASSWORD=minio12345
MINIO_ENDPOINT=minio:9000
MINIO_EXTERNAL_ENDPOINT=localhost:9000 #CDN
MINIO_USE_HTTPS=False
MINIO_EXTERNAL_ENDPOINT_USE_HTTPS=False #true online

PGADMIN_DEFAULT_EMAIL=admin@admin. com
PGADMIN_DEFAULT_PASSWORD=strongpassword

VERSION=1. 0. 0

SHAPEFILE_VERSION=gadm41_ITA_
```.
| * Loss: [TripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters: ```json { "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `num_train_epochs`: 4 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin
### Training Logs | Epoch | Step | Training Loss | val-triplet-eval_cosine_accuracy | |:------:|:-----:|:-------------:|:--------------------------------:| | 0.1704 | 500 | 4.5287 | - | | 0.3407 | 1000 | 4.1121 | 1.0 | | 0.5111 | 1500 | 3.7883 | - | | 0.6814 | 2000 | 3.6668 | 1.0 | | 0.8518 | 2500 | 3.6262 | - | | 1.0 | 2935 | - | 1.0 | | 1.0221 | 3000 | 3.586 | 1.0 | | 1.1925 | 3500 | 3.5752 | - | | 1.3629 | 4000 | 3.5576 | 1.0 | | 1.5332 | 4500 | 3.556 | - | | 1.7036 | 5000 | 3.5389 | 1.0 | | 1.8739 | 5500 | 3.526 | - | | 2.0 | 5870 | - | 1.0 | | 2.0443 | 6000 | 3.5228 | 1.0 | | 2.2147 | 6500 | 3.5234 | - | | 2.3850 | 7000 | 3.5122 | 1.0 | | 2.5554 | 7500 | 3.517 | - | | 2.7257 | 8000 | 3.5056 | 1.0 | | 2.8961 | 8500 | 3.5103 | - | | 3.0 | 8805 | - | 1.0 | | 3.0664 | 9000 | 3.5071 | 1.0 | | 3.2368 | 9500 | 3.4977 | - | | 3.4072 | 10000 | 3.4929 | 1.0 | | 3.5775 | 10500 | 3.4964 | - | | 3.7479 | 11000 | 3.4914 | 1.0 | | 3.9182 | 11500 | 3.491 | - | | 4.0 | 11740 | - | 1.0 | ### Framework Versions - Python: 3.11.13 - Sentence Transformers: 4.1.0 - Transformers: 4.53.2 - PyTorch: 2.6.0+cu124 - Accelerate: 1.8.1 - Datasets: 4.0.0 - Tokenizers: 0.21.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### TripletLoss ```bibtex @misc{hermans2017defense, title={In Defense of the Triplet Loss for Person Re-Identification}, author={Alexander Hermans and Lucas Beyer and Bastian Leibe}, year={2017}, eprint={1703.07737}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```