---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:208
- loss:BatchSemiHardTripletLoss
base_model: BAAI/bge-base-en
widget:
- source_sentence: '
Name : Gandalf
Category: Financial Services, Consulting
Department: Finance
Location: Singapore
Amount: 457.29
Card: Financial Advisory Services
Trip Name: unknown
'
sentences:
- '
Name : InterGlobal Tech
Category: Business Software Solutions, Data Processing Services
Department: Marketing
Location: New York, NY
Amount: 1249.95
Card: Marketing Automation Tools
Trip Name: unknown
'
- '
Name : Nuvotek Solutions
Category: Consulting Services, Managed IT Services
Department: Information Security
Location: Berlin, Germany
Amount: 879.65
Card: Annual Cybersecurity Resilience Program
Trip Name: unknown
'
- '
Name : Omega Systems Inc.
Category: Integrated Business Solutions, Enterprise Software Sales
Department: Research & Development
Location: Oslo, Norway
Amount: 1943.75
Card: AI Development Suite
Trip Name: unknown
'
- source_sentence: '
Name : NexGen Fiscal Systems
Category: Financial Software Solutions, Revenue Management Services
Department: Finance
Location: San Francisco, CA
Amount: 2749.95
Card: Q4 Revenue Optimization Initiative
Trip Name: unknown
'
sentences:
- '
Name : GlobalRes Workforce Solutions
Category: Remote Work Platforms, HR Technology Vendors
Department: Engineering
Location: Barcelona, Spain
Amount: 1894.27
Card: Hybrid Work Enablement
Trip Name: unknown
'
- '
Name : InterLang Solutions
Category: Language Interpretation Services, Remote Collaboration Tools
Department: HR
Location: Tokyo, Japan
Amount: 1642.59
Card: Diversity & Inclusion Initiatives
Trip Name: unknown
'
- '
Name : CovaRisk Consulting
Category: Risk Advisory, Financial Services
Department: Legal
Location: Toronto, Canada
Amount: 1124.37
Card: Assurance Payment
Trip Name: unknown
'
- source_sentence: '
Name : Optix Global
Category: Digital Storage Solutions, Office Essentials Provider
Department: All Departments
Location: Tokyo, Japan
Amount: 568.77
Card: Monthly Office Needs
Trip Name: unknown
'
sentences:
- '
Name : Digital Wave Solutions
Category: IT Infrastructure Services, Data Analytic Platforms
Department: Finance
Location: San Francisco, CA
Amount: 1748.92
Card: Annual Data Management & Reporting
Trip Name: unknown
'
- '
Name : Analytix Global Solutions
Category: Business Intelligence Services, Regulatory Compliance Tools
Department: Finance
Location: London, UK
Amount: 1323.67
Card: Financial Compliance Enhancement
Trip Name: unknown
'
- '
Name : Daesung Enterprises
Category: Catering Services, Event Management
Department: Sales
Location: Lisbon, Portugal
Amount: 375.45
Card: Q4 Client Engagement Events
Trip Name: unknown
'
- source_sentence: '
Name : Kanzan Solutions
Category: Consulting Services, Business Advisory
Department: Legal
Location: Tokyo, Japan
Amount: 3900.75
Card: Quarterly Compliance Review
Trip Name: unknown
'
sentences:
- '
Name : Alta Via Mix
Category: Airline Catering, Luxury Travel Services
Department: Executive
Location: Milan, Italy
Amount: 1925.49
Card: Executive Incentive Program
Trip Name: Annual Leadership Summit
'
- '
Name : RBS
Category: Financial Services, Business Consultancy
Department: Finance
Location: Toronto, Canada
Amount: 1134.28
Card: Cross-Border Transaction Facilitation
Trip Name: unknown
'
- '
Name : InnovaThink Global
Category: Management Consultancy, Technical Training Services
Department: HR
Location: Zurich, Switzerland
Amount: 1675.32
Card: Innovation and Efficiency Program
Trip Name: unknown
'
- source_sentence: '
Name : NetWise Solutions
Category: Data Transfer Services, Digital Infrastructure
Department: Product
Location: Singapore
Amount: 1579.42
Card: Global Network Enhancement
Trip Name: unknown
'
sentences:
- '
Name : Fernández & Co. Services
Category: Property Management, Facility Services
Department: Office Administration
Location: Madrid, Spain
Amount: 1245.67
Card: Monthly Facility Operations
Trip Name: unknown
'
- '
Name : AeroDyn Research
Category: Research Services, Data Analysis
Department: Research & Development
Location: Amsterdam, Netherlands
Amount: 2457.42
Card: Annual Innovation Assessment
Trip Name: unknown
'
- '
Name : Global Horizon Travel
Category: Travel Services, Package Deals
Department: Sales
Location: Tokyo, Japan
Amount: 1199.75
Card: Annual Sales Retreat
Trip Name: Sales Strategy Summit
'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
- dot_accuracy
- manhattan_accuracy
- euclidean_accuracy
- max_accuracy
model-index:
- name: SentenceTransformer based on BAAI/bge-base-en
results:
- task:
type: triplet
name: Triplet
dataset:
name: bge base en train
type: bge-base-en-train
metrics:
- type: cosine_accuracy
value: 0.8605769230769231
name: Cosine Accuracy
- type: dot_accuracy
value: 0.13942307692307693
name: Dot Accuracy
- type: manhattan_accuracy
value: 0.8413461538461539
name: Manhattan Accuracy
- type: euclidean_accuracy
value: 0.8605769230769231
name: Euclidean Accuracy
- type: max_accuracy
value: 0.8605769230769231
name: Max Accuracy
- task:
type: triplet
name: Triplet
dataset:
name: bge base en eval
type: bge-base-en-eval
metrics:
- type: cosine_accuracy
value: 0.9242424242424242
name: Cosine Accuracy
- type: dot_accuracy
value: 0.07575757575757576
name: Dot Accuracy
- type: manhattan_accuracy
value: 0.9545454545454546
name: Manhattan Accuracy
- type: euclidean_accuracy
value: 0.9242424242424242
name: Euclidean Accuracy
- type: max_accuracy
value: 0.9545454545454546
name: Max Accuracy
---
# SentenceTransformer based on BAAI/bge-base-en
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("dshvadskiy/finetuned-bge-base-en")
# Run inference
sentences = [
'\nName : NetWise Solutions\nCategory: Data Transfer Services, Digital Infrastructure\nDepartment: Product\nLocation: Singapore\nAmount: 1579.42\nCard: Global Network Enhancement\nTrip Name: unknown\n',
'\nName : Global Horizon Travel\nCategory: Travel Services, Package Deals\nDepartment: Sales\nLocation: Tokyo, Japan\nAmount: 1199.75\nCard: Annual Sales Retreat\nTrip Name: Sales Strategy Summit\n',
'\nName : AeroDyn Research\nCategory: Research Services, Data Analysis\nDepartment: Research & Development\nLocation: Amsterdam, Netherlands\nAmount: 2457.42\nCard: Annual Innovation Assessment\nTrip Name: unknown\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Triplet
* Dataset: `bge-base-en-train`
* Evaluated with [TripletEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
| Metric | Value |
|:-------------------|:-----------|
| cosine_accuracy | 0.8606 |
| dot_accuracy | 0.1394 |
| manhattan_accuracy | 0.8413 |
| euclidean_accuracy | 0.8606 |
| **max_accuracy** | **0.8606** |
#### Triplet
* Dataset: `bge-base-en-eval`
* Evaluated with [TripletEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
| Metric | Value |
|:-------------------|:-----------|
| cosine_accuracy | 0.9242 |
| dot_accuracy | 0.0758 |
| manhattan_accuracy | 0.9545 |
| euclidean_accuracy | 0.9242 |
| **max_accuracy** | **0.9545** |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 208 training samples
* Columns: sentence
and label
* Approximate statistics based on the first 208 samples:
| | sentence | label |
|:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | string | int |
| details |
Name : Yijie Logistics
Category: Logistics Services
Department: Sales
Location: Berlin, Germany
Amount: 485.67
Card: Quarterly Client Visit and Logistics Coordination
Trip Name: unknown
| 0
|
|
Name : Serenity Solutions
Category: Office Wellness Solutions
Department: Office Administration
Location: Munich, Germany
Amount: 772.58
Card: Ergonomic Office Enhancements
Trip Name: unknown
| 1
|
|
Name : Cortec International
Category: Event Management Services, Business Solutions
Department: Sales
Location: London, UK
Amount: 1337.25
Card: Global Sales Summit Participation
Trip Name: unknown
| 2
|
* Loss: [BatchSemiHardTripletLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchsemihardtripletloss)
### Evaluation Dataset
#### Unnamed Dataset
* Size: 52 evaluation samples
* Columns: sentence
and label
* Approximate statistics based on the first 52 samples:
| | sentence | label |
|:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | string | int |
| details |
Name : Versatile Systems Ltd.
Category: Office Management Solutions, Software Solutions
Department: Office Administration
Location: Tokyo, Japan
Amount: 845.67
Card: Integrated Office Infrastructure
Trip Name: unknown
| 21
|
|
Name : NexGen Comms
Category: Telecom Services, Communications Solutions
Department: Sales
Location: Berlin, Germany
Amount: 879.45
Card: Q2 Client Outreach Program
Trip Name: unknown
| 23
|
|
Name : Digital Wave Solutions
Category: IT Infrastructure Services, Data Analytic Platforms
Department: Finance
Location: San Francisco, CA
Amount: 1748.92
Card: Annual Data Management & Reporting
Trip Name: unknown
| 18
|
* Loss: [BatchSemiHardTripletLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchsemihardtripletloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 5
- `warmup_ratio`: 0.1
- `batch_sampler`: no_duplicates
#### All Hyperparameters