SentenceTransformer based on BAAI/bge-base-en

This is a sentence-transformers model finetuned from BAAI/bge-base-en. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/finetuned-bge-base-en")
# Run inference
sentences = [
    'Occurence highlighting highlights wrong part of the code <!-- Please search existing issues to avoid creating duplicates. -->\r\n\r\n## Environment data\r\n\r\n-   VS Code version: 1.58.0-insider 062e6519f8973fede2ca736e80682bd19007460a \r\n-   Jupyter Extension version (available under the Extensions sidebar):  v2021.8.1000539794\r\n-   Python Extension version (available under the Extensions sidebar): v2021.6.944021595\r\n-   OS (Windows | Mac | Linux distro) and version: Ubuntu 18.04\r\n-   Python and/or Anaconda version: 3.9.2 Anaconda\r\n-   Type of virtual environment used (N/A | venv | virtualenv | conda | ...): conda\r\n-   Jupyter server running: Remote \r\n\r\nIt seems that issues https://github.com/microsoft/vscode/issues/120148 and https://github.com/microsoft/vscode-jupyter/issues/5451 have been closed but the problem still exists in the last versions. I have not seen any similar issues on the repo',
    'File explorer is expanding all root folders in a MR workspace Steps to Reproduce:\r\n\r\n1.  Create a MR workspace file with more than one folder\r\n2. Open the MR workspace\r\n\r\n🐛 All top level folders are expanded. This is very slow if there are lot of root folders and also if the MR workspace is in remote\r\n',
    'Quick input reset scroll position * use latest from master\r\n* f1 > insert snippet\r\n* scroll down to an extension snippet and hide it (press 👁️ icon)\r\n* :bug: the scroll position resets\r\n\r\nThis is happening when reassigning the items (since the press changed the label) here: https://github.com/microsoft/vscode/blob/92314d61a55f466c125fa9d1f9fe8da633a82423/src/vs/workbench/contrib/snippets/browser/insertSnippet.ts#L213',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5572, 0.5031],
#         [0.5572, 1.0000, 0.5477],
#         [0.5031, 0.5477, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9479

Triplet

Metric Value
cosine_accuracy 0.9933

Training Details

Training Dataset

Unnamed Dataset

  • Size: 445 training samples
  • Columns: sentence and label
  • Approximate statistics based on the first 445 samples:
    sentence label
    type string float
    details
    • min: 20 tokens
    • mean: 338.03 tokens
    • max: 512 tokens
    • min: 1.0
    • mean: 149.53
    • max: 299.0
  • Samples:
    sentence label
    Branch list is sometimes out of order

    Type: Bug


    1. Open a workspace

    2. Quickly open the branch picker and type main


    Bug

    The first time you do this, sometimes you end up with an unordered list:


    Image



    The correct order shows up when you keep start typing or try doing this again:



    Image






    VS Code version: Code - Insiders 1.91.0-insider (Universal) (0354163c1c66b950b0762364f5b4cd37937b624a, 2024-06-26T10:12:33.304Z)

    OS version: Darwin arm64 23.5.0

    Modes:



    System Info


    |Item|Value|

    |---|---|

    |CPUs|Apple M2 Max (12 x 2400)|

    |GPU Status|2d_canvas: unavailable_software
    canvas_oop_rasterization: disabled_off
    direct_rendering_display_compositor: disabled_off_ok
    gpu_compositing: disabled_software
    multiple_raster_threads: enabled_on
    ope...
    299.0
    Git Branch Picker Race Condition If I paste the branch too quickly and then press enter, it does not switch to it, but creates a new branch.

    This breaks muscle memory, as it works when you do it slowly.


    Code_-_Insiders_peF36XR6nS



    Once loading completes, it should select the branch again.
    299.0
    Ctrl+I stopped working after first hold+talk+release Testing #213355


    Screencast shows that it seems to be in the wrong context and is trying to stop the session?


    Recording 2024-05-28 at 14 05 54


    Repro was just asking "Testing testing" and then trying to ask something else
    298.0
  • Loss: BatchSemiHardTripletLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 95 evaluation samples
  • Columns: sentence and label
  • Approximate statistics based on the first 95 samples:
    sentence label
    type string float
    details
    • min: 38 tokens
    • mean: 348.78 tokens
    • max: 512 tokens
    • min: 3.0
    • mean: 149.72
    • max: 296.0
  • Samples:
    sentence label
    VS Code does not delete old extension versions even after restart







    Does this issue occur when all extensions are disabled?: Yes



Downloads last month
2
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaa961/finetuned-bge-base-en

Base model

BAAI/bge-base-en
Finetuned
(37)
this model

Evaluation results