metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:46957
- loss:TripletLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: How to load documents?
sentences:
- >-
MapCity contains the geometries that are displayed on the interactive
map on the frontend.
- >-
The maps app contains State, Region, Province, Company, City,
Particella, Map, MapCity, Property, Group, PropertyHasMap, Palette,
PaletteColor, and Geotiff models.
- >-
Use the load_documents command which creates document file instances
from folders in ./files/2-Database-solare path.
- source_sentence: What is the MapCity model?
sentences:
- >-
The document app contains Document, DocumentFile, Type, Language, Theme,
Keyword, and Oss models used in the document consultation section.
- >-
Document contains all the document metadata such as name, author, year,
type, language used in the document consultation section.
- >-
MapCity contains the geometries that are displayed on the interactive
map on the frontend.
- source_sentence: What is the cleantables command?
sentences:
- >-
Takes care of eliminating all instances of Palette, Group, MapCity, Map,
Province, and Property models.
- >-
Set CORS_ALLOWED_ORIGINS in the environment file with allowed origins
like localhost,127.0.0.1,http://localhost:3000.
- |-
from matplotlib import pyplot as plt
colors = ['Accent', 'Accent_r', 'Blues', 'Blues_r', 'BrBG', 'BrBG_r',]
ax = res_union. plot(cmap=colors[random. randint(0, len(colors))])
ax = res_union. plot(cmap='Greens_r')
gdf1. plot(ax=ax, facecolor='none', edgecolor='k')
gdf2. plot(ax=ax, facecolor='none', edgecolor='k')
plt. savefig("overlay. png")
```.
- source_sentence: How to restore a database dump?
sentences:
- >-
Use the generategeotiff command which generates Geotiff instances from a
shapefile. Run with python manage.py generategeotiff <path>.
- >-
Use the generategeotiff command which generates Geotiff instances from a
shapefile. Run with python manage.py generategeotiff <path>.
- >-
Copy the dump file to data/postgresql folder, then inside the database
container run pg_restore -U $POSTGRES_USER -d $POSTGRES_DB --clean
--if-exists /var/lib/postgresql/data/db_backup.dump
- source_sentence: What is the State model?
sentences:
- >-
State contains the geometries of the states, in our specific case it
contains only the entire geometries of the Italian state.
- >-
Use the load_documents command which creates document file instances
from folders in ./files/2-Database-solare path.
- >-
The maps app contains State, Region, Province, Company, City,
Particella, Map, MapCity, Property, Group, PropertyHasMap, Palette,
PaletteColor, and Geotiff models.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
model-index:
- name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
results:
- task:
type: triplet
name: Triplet
dataset:
name: val triplet eval
type: val-triplet-eval
metrics:
- type: cosine_accuracy
value: 1
name: Cosine Accuracy
SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("gabrielegabellone/all-mini-mediterraneo-triplets-v4")
# Run inference
sentences = [
'What is the State model?',
'State contains the geometries of the states, in our specific case it contains only the entire geometries of the Italian state.',
'The maps app contains State, Region, Province, Company, City, Particella, Map, MapCity, Property, Group, PropertyHasMap, Palette, PaletteColor, and Geotiff models.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
val-triplet-eval
- Evaluated with
TripletEvaluator
Metric | Value |
---|---|
cosine_accuracy | 1.0 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 46,957 training samples
- Columns:
sentence_0
,sentence_1
, andsentence_2
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 sentence_2 type string string string details - min: 7 tokens
- mean: 9.62 tokens
- max: 15 tokens
- min: 21 tokens
- mean: 35.4 tokens
- max: 68 tokens
- min: 21 tokens
- mean: 73.85 tokens
- max: 239 tokens
- Samples:
sentence_0 sentence_1 sentence_2 How to restore a database dump?
Copy the dump file to data/postgresql folder, then inside the database container run pg_restore -U $POSTGRES_USER -d $POSTGRES_DB --clean --if-exists /var/lib/postgresql/data/db_backup.dump
filter(id__in=ids_dataframe1)
ids_dataframe2 = df2. split(',')
maps = Map. objects. filter(id__in=ids_dataframe2)
if not provinces or not maps:
return Response('Provinces or maps not found', status=status. HTTP_404_NOT_FOUND)
```
2. Then we use the **Geodataframe.What is the Region model?
Region contains the geometries of the regions, in our specific case it only contains the geometries of the Italian regions.
The command allows loading data into the project based on a compiled excel file.
Allows loading data on scenarios, shapefiles, palettes, software, particles and companies.
1. Run the command::
bash<br> python manage. py flow<br>
2. Choose the type of data to load:
```
Executing consistency checks.
Load scenarios data. (y/n): n
Load softwares data.What is the generategeotiff command?
This command generates Geotiff instances from a shapefile. For each Property present in the shapefile, a Geotiff instance will be created.
MINIO_ROOT_USER=minio12345
MINIO_ROOT_PASSWORD=minio12345
MINIO_ENDPOINT=minio:9000
MINIO_EXTERNAL_ENDPOINT=localhost:9000 #CDN
MINIO_USE_HTTPS=False
MINIO_EXTERNAL_ENDPOINT_USE_HTTPS=False #true online
PGADMIN_DEFAULT_EMAIL=admin@admin. com
PGADMIN_DEFAULT_PASSWORD=strongpassword
VERSION=1. 0. 0
SHAPEFILE_VERSION=gadm41_ITA_
```. - Loss:
TripletLoss
with these parameters:{ "distance_metric": "TripletDistanceMetric.EUCLIDEAN", "triplet_margin": 5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 4multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 4max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsehub_revision
: Nonegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseliger_kernel_config
: Noneeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss | val-triplet-eval_cosine_accuracy |
---|---|---|---|
0.1704 | 500 | 4.5287 | - |
0.3407 | 1000 | 4.1121 | 1.0 |
0.5111 | 1500 | 3.7883 | - |
0.6814 | 2000 | 3.6668 | 1.0 |
0.8518 | 2500 | 3.6262 | - |
1.0 | 2935 | - | 1.0 |
1.0221 | 3000 | 3.586 | 1.0 |
1.1925 | 3500 | 3.5752 | - |
1.3629 | 4000 | 3.5576 | 1.0 |
1.5332 | 4500 | 3.556 | - |
1.7036 | 5000 | 3.5389 | 1.0 |
1.8739 | 5500 | 3.526 | - |
2.0 | 5870 | - | 1.0 |
2.0443 | 6000 | 3.5228 | 1.0 |
2.2147 | 6500 | 3.5234 | - |
2.3850 | 7000 | 3.5122 | 1.0 |
2.5554 | 7500 | 3.517 | - |
2.7257 | 8000 | 3.5056 | 1.0 |
2.8961 | 8500 | 3.5103 | - |
3.0 | 8805 | - | 1.0 |
3.0664 | 9000 | 3.5071 | 1.0 |
3.2368 | 9500 | 3.4977 | - |
3.4072 | 10000 | 3.4929 | 1.0 |
3.5775 | 10500 | 3.4964 | - |
3.7479 | 11000 | 3.4914 | 1.0 |
3.9182 | 11500 | 3.491 | - |
4.0 | 11740 | - | 1.0 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.53.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.8.1
- Datasets: 4.0.0
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}